2003 december

319
From angel at miami.edu Mon Dec 1 10:25:34 2003 From: angel at miami.edu (Angel Li) Date: Mon, 01 Dec 2003 13:25:34 -0500 Subject: [Rocks-Discuss]cluster-fork Message-ID: <[email protected]> Hi, I recently installed Rocks 3.0 on a Linux cluster and when I run the command "cluster-fork" I get this error: apple* cluster-fork ls Traceback (innermost last): File "/opt/rocks/sbin/cluster-fork", line 88, in ? import rocks.pssh File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? import gmon.encoder ImportError: Bad magic number in /usr/lib/python1.5/site-packages/gmon/encoder.pyc Any thoughts? I'm also wondering where to find the python sources for files in /usr/lib/python1.5/site-packages/gmon. Thanks, Angel From jghobrial at uh.edu Mon Dec 1 11:35:06 2003 From: jghobrial at uh.edu (Joseph) Date: Mon, 1 Dec 2003 13:35:06 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <[email protected]> References: <[email protected]> Message-ID: <[email protected]> On Mon, 1 Dec 2003, Angel Li wrote: Hello Angel, I have the same problem and so far there is no response when I posted about this a month ago. Is your frontend an AMD setup?? I am thinking this is an AMD problem. Thanks, Joseph > Hi, > > I recently installed Rocks 3.0 on a Linux cluster and when I run the > command "cluster-fork" I get this error: > > apple* cluster-fork ls > Traceback (innermost last): > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > import rocks.pssh > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?

Upload: guestcc519e

Post on 14-Jul-2015

171 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2003 December

From angel at miami.edu Mon Dec 1 10:25:34 2003From: angel at miami.edu (Angel Li)Date: Mon, 01 Dec 2003 13:25:34 -0500Subject: [Rocks-Discuss]cluster-forkMessage-ID: <[email protected]>

Hi,

I recently installed Rocks 3.0 on a Linux cluster and when I run the command "cluster-fork" I get this error:

apple* cluster-fork lsTraceback (innermost last): File "/opt/rocks/sbin/cluster-fork", line 88, in ? import rocks.pssh File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? import gmon.encoderImportError: Bad magic number in /usr/lib/python1.5/site-packages/gmon/encoder.pyc

Any thoughts? I'm also wondering where to find the python sources for files in /usr/lib/python1.5/site-packages/gmon.

Thanks,

Angel

From jghobrial at uh.edu Mon Dec 1 11:35:06 2003From: jghobrial at uh.edu (Joseph)Date: Mon, 1 Dec 2003 13:35:06 -0600 (CST)Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

On Mon, 1 Dec 2003, Angel Li wrote:Hello Angel, I have the same problem and so far there is no response when I posted about this a month ago.

Is your frontend an AMD setup??

I am thinking this is an AMD problem.

Thanks,Joseph

> Hi,> > I recently installed Rocks 3.0 on a Linux cluster and when I run the > command "cluster-fork" I get this error:> > apple* cluster-fork ls> Traceback (innermost last):> File "/opt/rocks/sbin/cluster-fork", line 88, in ?> import rocks.pssh> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?

Page 2: 2003 December

> import gmon.encoder> ImportError: Bad magic number in > /usr/lib/python1.5/site-packages/gmon/encoder.pyc> > Any thoughts? I'm also wondering where to find the python sources for > files in /usr/lib/python1.5/site-packages/gmon.> > Thanks,> > Angel>

From tim.carlson at pnl.gov Mon Dec 1 14:58:54 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Mon, 01 Dec 2003 14:58:54 -0800 (PST)Subject: [Rocks-Discuss]odd kickstart problemIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get thefollowing error in /var/log/httpd/error_log

Traceback (innermost last): File "/opt/rocks/sbin/kgen", line 530, in ? app.run() File "/opt/rocks/sbin/kgen", line 497, in run doc = FromXmlStream(file) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line386, in FromXmlStream return reader.fromStream(stream, ownerDocument) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line372, in fromStream self.parser.parse(s) File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 58,in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line 125,in parse self.close() File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line154, in close self.feed("", isFinal = 1) File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line148, in feed self._err_handler.fatalError(exc) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line340, in fatalError raise exceptionxml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found

Doing a wget of http://frontend-0/install/kickstart.cgi\?arch=i386\&np=2\&project=rockson one of the working internal nodes yields the same error.

Any thoughts on this?

Page 3: 2003 December

I've also done a freshrocks-dist dist

Tim

From sjenks at uci.edu Mon Dec 1 15:35:54 2003From: sjenks at uci.edu (Stephen Jenks)Date: Mon, 1 Dec 2003 15:35:54 -0800Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

FYI, I have a dual Athlon frontend and didn't have that problem. I know that doesn't exactly help you, but at least it doesn't fail on all AMD machines.

It looks like the .pyc file might be corrupt in your installation. The source .py file (encoder.py) is in the /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing the .pyc file would regenerate it (if you run cluster-fork as root?)

The md5sum for encoder.pyc on my system is:459c78750fe6e065e9ed464ab23ab73d encoder.pycSo you can check if yours is different.

Steve Jenks

On Dec 1, 2003, at 11:35 AM, Joseph wrote:

> On Mon, 1 Dec 2003, Angel Li wrote:> Hello Angel, I have the same problem and so far there is no response > when> I posted about this a month ago.>> Is your frontend an AMD setup??>> I am thinking this is an AMD problem.>> Thanks,> Joseph>>>> Hi,>>>> I recently installed Rocks 3.0 on a Linux cluster and when I run the>> command "cluster-fork" I get this error:>>>> apple* cluster-fork ls>> Traceback (innermost last):>> File "/opt/rocks/sbin/cluster-fork", line 88, in ?>> import rocks.pssh>> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?>> import gmon.encoder>> ImportError: Bad magic number in

Page 4: 2003 December

>> /usr/lib/python1.5/site-packages/gmon/encoder.pyc>>>> Any thoughts? I'm also wondering where to find the python sources for>> files in /usr/lib/python1.5/site-packages/gmon.>>>> Thanks,>>>> Angel>>

From mjk at sdsc.edu Mon Dec 1 19:03:16 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Mon, 1 Dec 2003 19:03:16 -0800Subject: [Rocks-Discuss]odd kickstart problemIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

You'll need to run the kpp and kgen steps (what kickstart.cgi does for your) manually to find if this is an XML error.

# cd /home/install/profiles/current# kpp compute

This will generate a kickstart file for a compute nodes, although some information will be missing since it isn't specific to a node (not like what ./kickstart.cgi --client=node-name generates). But what this does do is traverse the XML graph and build a monolithic XML kickstart profile. If this step works you can then "|" pipe the output into kgen to convert the XML to kickstart syntax. Something in this procedure should fail and point to the error.

-mjk

On Dec 1, 2003, at 2:58 PM, Tim Carlson wrote:

> Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get > the> following error in /var/log/httpd/error_log>>> Traceback (innermost last):> File "/opt/rocks/sbin/kgen", line 530, in ?> app.run()> File "/opt/rocks/sbin/kgen", line 497, in run> doc = FromXmlStream(file)> File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line> 386, in FromXmlStream> return reader.fromStream(stream, ownerDocument)> File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line> 372, in fromStream> self.parser.parse(s)> File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line > 58,> in parse

Page 5: 2003 December

> xmlreader.IncrementalParser.parse(self, source)> File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line > 125,> in parse> self.close()> File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line> 154, in close> self.feed("", isFinal = 1)> File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line> 148, in feed> self._err_handler.fatalError(exc)> File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line> 340, in fatalError> raise exception> xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found>>> Doing a wget of > http://frontend-0/install/kickstart.cgi\? > arch=i386\&np=2\&project=rocks> on one of the working internal nodes yields the same error.>> Any thoughts on this?>> I've also done a fresh> rocks-dist dist>> Tim

From tim.carlson at pnl.gov Mon Dec 1 20:42:51 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Mon, 01 Dec 2003 20:42:51 -0800 (PST)Subject: [Rocks-Discuss]odd kickstart problemIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Mon, 1 Dec 2003, Mason J. Katz wrote:

> You'll need to run the kpp and kgen steps (what kickstart.cgi does for> your) manually to find if this is an XML error.>> # cd /home/install/profiles/current> # kpp compute

That was the trick. This sent me down the correct path. I had uninstalledSGE on the frontend (I was having problems with SGE and wanted to startfrom scratch)

Adding the 2 SGE XML files back to /home/install/profiles/2.3.2/nodes/fixed everything

Thanks!

Tim

Page 6: 2003 December

From landman at scalableinformatics.com Tue Dec 2 04:15:07 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 02 Dec 2003 07:15:07 -0500Subject: [Rocks-Discuss]supermicro based MB'sMessage-ID: <[email protected]>

Folks:

Working on integrating a Supermicro MB based cluster. Discovered early on that all of the compute nodes have an Intel based NIC that RedHat doesn't know anything about (any version of RH). Some of the administrative nodes have other similar issues. I am seeing simply a suprising number of mis/un detected hardware across the collection of MBs.

Anyone have advice on where to get modules/module source for Redhat for these things? It looks like I will need to rebuild the boot CD, though the several times I have tried this previously have failed to produce a working/bootable system. It looks like new modules need to be created/inserted into the boot process (head node and cluster nodes) kernels, as well as into the installable kernels.

Has anyone done this for a Supermicro MB based system? Thanks .

Joe

-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

From jghobrial at uh.edu Tue Dec 2 08:28:08 2003From: jghobrial at uh.edu (Joseph)Date: Tue, 2 Dec 2003 10:28:08 -0600 (CST)Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Indeed my md5sum is different for encoder.pyc. However, when I pulled the file and run "cluster-fork" python responds about an import problem. So it seems that regeneration did not occur. Is there a flag I need to pass?

I have also tried to figure out what package provides encoder and reinstall the package, but an rpm query reveals nothing.

If this is a generated file, what generates it?

It seems that an rpm file query on ganglia show that files in the directory belong to the package, but encoder.pyc does not.

Thanks,

Page 7: 2003 December

Joseph

On Mon, 1 Dec 2003, Stephen Jenks wrote:> FYI, I have a dual Athlon frontend and didn't have that problem. I know > that doesn't exactly help you, but at least it doesn't fail on all AMD > machines.> > It looks like the .pyc file might be corrupt in your installation. The > source .py file (encoder.py) is in the > /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing > the .pyc file would regenerate it (if you run cluster-fork as root?)> > The md5sum for encoder.pyc on my system is:> 459c78750fe6e065e9ed464ab23ab73d encoder.pyc> So you can check if yours is different.> > Steve Jenks> > > On Dec 1, 2003, at 11:35 AM, Joseph wrote:> > > On Mon, 1 Dec 2003, Angel Li wrote:> > Hello Angel, I have the same problem and so far there is no response > > when> > I posted about this a month ago.> >> > Is your frontend an AMD setup??> >> > I am thinking this is an AMD problem.> >> > Thanks,> > Joseph> >> >> >> Hi,> >>> >> I recently installed Rocks 3.0 on a Linux cluster and when I run the> >> command "cluster-fork" I get this error:> >>> >> apple* cluster-fork ls> >> Traceback (innermost last):> >> File "/opt/rocks/sbin/cluster-fork", line 88, in ?> >> import rocks.pssh> >> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?> >> import gmon.encoder> >> ImportError: Bad magic number in> >> /usr/lib/python1.5/site-packages/gmon/encoder.pyc> >>> >> Any thoughts? I'm also wondering where to find the python sources for> >> files in /usr/lib/python1.5/site-packages/gmon.> >>> >> Thanks,> >>> >> Angel> >>>

Page 8: 2003 December

From angel at miami.edu Tue Dec 2 09:02:55 2003From: angel at miami.edu (Angel Li)Date: Tue, 02 Dec 2003 12:02:55 -0500Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Joseph wrote:

>Indeed my md5sum is different for encoder.pyc. However, when I pulled the >file and run "cluster-fork" python responds about an import problem. So it >seems that regeneration did not occur. Is there a flag I need to pass?>>I have also tried to figure out what package provides encoder and >reinstall the package, but an rpm query reveals nothing.>>If this is a generated file, what generates it?>>It seems that an rpm file query on ganglia show that files in the >directory belong to the package, but encoder.pyc does not.>>Thanks,>Joseph>>> >I have finally found the python sources in the HPC rolls CD, filename ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it seems python "compiles" the .py files to ".pyc" and then deletes the source file the first time they are referenced? I also noticed that there are two versions of python installed. Maybe the pyc files from one version won't load into the other one?

Angel

From mjk at sdsc.edu Tue Dec 2 15:52:52 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 2 Dec 2003 15:52:52 -0800Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Python creates the .pyc files for you, and does not remove the original .py file. I would be extremely surprised it two "identical" .pyc files had the same md5 checksum. I'd expect this to be more like C .o file which always contain random data to pad out to the end of a page and

Page 9: 2003 December

32/64 bit word sizes. Still this is just a guess, the real point is you can always remove the .pyc files and the .py will regenerate it when imported (although standard UNIX file/dir permission still apply).

What is the import error you get from cluster-fork?

-mjk

On Dec 2, 2003, at 9:02 AM, Angel Li wrote:

> Joseph wrote:>>> Indeed my md5sum is different for encoder.pyc. However, when I pulled >> the file and run "cluster-fork" python responds about an import >> problem. So it seems that regeneration did not occur. Is there a flag >> I need to pass?>>>> I have also tried to figure out what package provides encoder and >> reinstall the package, but an rpm query reveals nothing.>>>> If this is a generated file, what generates it?>>>> It seems that an rpm file query on ganglia show that files in the >> directory belong to the package, but encoder.pyc does not.>>>> Thanks,>> Joseph>>>>>>> I have finally found the python sources in the HPC rolls CD, filename > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it > seems python "compiles" the .py files to ".pyc" and then deletes the > source file the first time they are referenced? I also noticed that > there are two versions of python installed. Maybe the pyc files from > one version won't load into the other one?>> Angel>>

From vrowley at ucsd.edu Mon Dec 1 14:27:03 2003From: vrowley at ucsd.edu (V. Rowley)Date: Mon, 01 Dec 2003 14:27:03 -0800Subject: [Rocks-Discuss]PXE boot problemsMessage-ID: <[email protected]>

We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to install a compute node via PXE. We are getting an error similar to the one mentioned in the archives, e.g.

> Loading initrd.img....> Ready> > Failed to free base memory>

Page 10: 2003 December

We have upgraded to syslinux-2.07-1, per the suggestion in the archives, but continue to get the same error. Any ideas?

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China at http://www.sagacitech.com/Chinaweb

From naihh at imcb.a-star.edu.sg Tue Dec 2 18:50:55 2003From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)Date: Wed, 3 Dec 2003 10:50:55 +0800Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium?Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>

Hi Laurence,

I just downloaded the Rocks3.0 for IA32 and installed it but SGE isstill not working.

Any idea?

Nai Hong Hwa FrancisInstitute of Molecular and Cell Biology (A*STAR)30 Medical DriveSingapore 117609.DID: (65) 6874-6196

-----Original Message-----From: Laurence Liew [mailto:laurence at scalablesys.com] Sent: Thursday, November 20, 2003 2:53 PMTo: Nai Hong Hwa FrancisCc: npaci-rocks-discussion at sdsc.eduSubject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be includedinRocks 3 for Itanium?

Hi Francis

GridEngine roll is ready for ia32. We will get a ia64 native versionready as soon as we get back from SC2003. It will be released in a fewweeks time.

Globus GT2.4 is included in the Grid Roll

Cheers!Laurence

On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:> > Hi,

Page 11: 2003 December

> > Does anyone have any idea when will Sun Grid Engine be included aspart> of Rocks 3 distribution.> > I am a newbie to Grid Computing.> Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?> > Regards> > Nai Hong Hwa Francis> > Institute of Molecular and Cell Biology (A*STAR)> 30 Medical Drive> Singapore 117609> DID: 65-6874-6196> > -----Original Message-----> From: npaci-rocks-discussion-request at sdsc.edu> [mailto:npaci-rocks-discussion-request at sdsc.edu] > Sent: Thursday, November 20, 2003 4:01 AM> To: npaci-rocks-discussion at sdsc.edu> Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs> > Send npaci-rocks-discussion mailing list submissions to> npaci-rocks-discussion at sdsc.edu> > To subscribe or unsubscribe via the World Wide Web, visit> > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion> or, via email, send a message with subject or body 'help' to> npaci-rocks-discussion-request at sdsc.edu> > You can reach the person managing the list at> npaci-rocks-discussion-admin at sdsc.edu> > When replying, please edit your Subject line so it is more specific> than "Re: Contents of npaci-rocks-discussion digest..."> > > Today's Topics:> > 1. top500 cluster installation movie (Greg Bruno)> 2. Re: Running Normal Application on Rocks Cluster -> Newbie Question (Laurence Liew)> > --__--__--> > Message: 1> To: npaci-rocks-discussion at sdsc.edu> From: Greg Bruno <bruno at rocksclusters.org>> Date: Tue, 18 Nov 2003 13:41:15 -0800> Subject: [Rocks-Discuss]top500 cluster installation movie> > here's a crew of 7, installing the 201st fastest supercomputer in the > world in under two hours on the showroom floor at SC 03:> > http://www.rocksclusters.org/rocks.mov>

Page 12: 2003 December

> warning: the above file is ~65MB.> > - gb> > > --__--__--> > Message: 2> Subject: Re: [Rocks-Discuss]Running Normal Application on RocksCluster> -> Newbie Question> From: Laurence Liew <laurenceliew at yahoo.com.sg>> To: Leong Chee Shian <chee-shian.leong at schenker.com>> Cc: npaci-rocks-discussion at sdsc.edu> Date: Wed, 19 Nov 2003 12:31:18 +0800> > Chee Shian,> > Thanks for your call. We will take this off list and visit you nextweek> in your office as you requested.> > Cheers!> laurence> > > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:> > I have just installed Rocks 3.0 with one frontend and two compute> > node. > > > > A normal file based application is installed on the frontend and is> > NFS shared to the compute nodes . > > > > Question is : When run 5 sessions of my applications , the CPU> > utilization is all concentrated on the frontend node , nothing is> > being passed on to the compute nodes . How do I make these 3computers> > to function as one and share the load ?> > > > Thanks everyone as I am really new to this clustering stuff..> > > > PS : The idea of exploring rocks cluster is to use a few inexpensive> > intel machines to replace our existing multi CPU sun server,> > suggestions and recommendations are greatly appreciated.> > > > > > Leong> > > > > > > > > > --__--__--> > _______________________________________________> npaci-rocks-discussion mailing list

Page 13: 2003 December

> npaci-rocks-discussion at sdsc.edu> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion> > > End of npaci-rocks-discussion Digest> > > DISCLAIMER:> This email is confidential and may be privileged. If you are not theintended recipient, please delete it and notify us immediately. Pleasedo not copy or use it for any purpose, or disclose its contents to anyother person as it may be an offence under the Official Secrets Act.Thank you.-- Laurence LiewCTO, Scalable Systems Pte Ltd7 Bedok South RoadSingapore 469272Tel : 65 6827 3953Fax : 65 6827 3922Mobile: 65 9029 4312Email : laurence at scalablesys.com http://www.scalablesys.com

DISCLAIMER:This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you.

From laurence at scalablesys.com Tue Dec 2 19:10:08 2003From: laurence at scalablesys.com (Laurence Liew)Date: Wed, 03 Dec 2003 11:10:08 +0800Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included

inRocks 3 for Itanium?In-Reply-To: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>References:

<5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>Message-ID: <1070421007.2452.51.camel@scalable>

Hi,

SGE is in the SGE roll.

You need to download the base, hpc and sge roll.

The install is now different from V2.3.x

Cheers!laurence

On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote:> Hi Laurence,>

Page 14: 2003 December

> I just downloaded the Rocks3.0 for IA32 and installed it but SGE is> still not working.> > Any idea?> > Nai Hong Hwa Francis> Institute of Molecular and Cell Biology (A*STAR)> 30 Medical Drive> Singapore 117609.> DID: (65) 6874-6196> > -----Original Message-----> From: Laurence Liew [mailto:laurence at scalablesys.com] > Sent: Thursday, November 20, 2003 2:53 PM> To: Nai Hong Hwa Francis> Cc: npaci-rocks-discussion at sdsc.edu> Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included> inRocks 3 for Itanium?> > Hi Francis> > GridEngine roll is ready for ia32. We will get a ia64 native version> ready as soon as we get back from SC2003. It will be released in a few> weeks time.> > Globus GT2.4 is included in the Grid Roll> > Cheers!> Laurence> > > On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:> > > > Hi,> > > > Does anyone have any idea when will Sun Grid Engine be included as> part> > of Rocks 3 distribution.> > > > I am a newbie to Grid Computing.> > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?> > > > Regards> > > > Nai Hong Hwa Francis> > > > Institute of Molecular and Cell Biology (A*STAR)> > 30 Medical Drive> > Singapore 117609> > DID: 65-6874-6196> > > > -----Original Message-----> > From: npaci-rocks-discussion-request at sdsc.edu> > [mailto:npaci-rocks-discussion-request at sdsc.edu] > > Sent: Thursday, November 20, 2003 4:01 AM> > To: npaci-rocks-discussion at sdsc.edu> > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs> > > > Send npaci-rocks-discussion mailing list submissions to

Page 15: 2003 December

> > npaci-rocks-discussion at sdsc.edu> > > > To subscribe or unsubscribe via the World Wide Web, visit> > > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion> > or, via email, send a message with subject or body 'help' to> > npaci-rocks-discussion-request at sdsc.edu> > > > You can reach the person managing the list at> > npaci-rocks-discussion-admin at sdsc.edu> > > > When replying, please edit your Subject line so it is more specific> > than "Re: Contents of npaci-rocks-discussion digest..."> > > > > > Today's Topics:> > > > 1. top500 cluster installation movie (Greg Bruno)> > 2. Re: Running Normal Application on Rocks Cluster -> > Newbie Question (Laurence Liew)> > > > --__--__--> > > > Message: 1> > To: npaci-rocks-discussion at sdsc.edu> > From: Greg Bruno <bruno at rocksclusters.org>> > Date: Tue, 18 Nov 2003 13:41:15 -0800> > Subject: [Rocks-Discuss]top500 cluster installation movie> > > > here's a crew of 7, installing the 201st fastest supercomputer in the > > world in under two hours on the showroom floor at SC 03:> > > > http://www.rocksclusters.org/rocks.mov> > > > warning: the above file is ~65MB.> > > > - gb> > > > > > --__--__--> > > > Message: 2> > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks> Cluster> > -> > Newbie Question> > From: Laurence Liew <laurenceliew at yahoo.com.sg>> > To: Leong Chee Shian <chee-shian.leong at schenker.com>> > Cc: npaci-rocks-discussion at sdsc.edu> > Date: Wed, 19 Nov 2003 12:31:18 +0800> > > > Chee Shian,> > > > Thanks for your call. We will take this off list and visit you next> week> > in your office as you requested.> > > > Cheers!> > laurence

Page 16: 2003 December

> > > > > > > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:> > > I have just installed Rocks 3.0 with one frontend and two compute> > > node. > > > > > > A normal file based application is installed on the frontend and is> > > NFS shared to the compute nodes . > > > > > > Question is : When run 5 sessions of my applications , the CPU> > > utilization is all concentrated on the frontend node , nothing is> > > being passed on to the compute nodes . How do I make these 3> computers> > > to function as one and share the load ?> > > > > > Thanks everyone as I am really new to this clustering stuff..> > > > > > PS : The idea of exploring rocks cluster is to use a few inexpensive> > > intel machines to replace our existing multi CPU sun server,> > > suggestions and recommendations are greatly appreciated.> > > > > > > > > Leong> > > > > > > > > > > > > > > > > --__--__--> > > > _______________________________________________> > npaci-rocks-discussion mailing list> > npaci-rocks-discussion at sdsc.edu> > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion> > > > > > End of npaci-rocks-discussion Digest> > > > > > DISCLAIMER:> > This email is confidential and may be privileged. If you are not the> intended recipient, please delete it and notify us immediately. Please> do not copy or use it for any purpose, or disclose its contents to any> other person as it may be an offence under the Official Secrets Act.> Thank you.-- Laurence LiewCTO, Scalable Systems Pte Ltd7 Bedok South RoadSingapore 469272Tel : 65 6827 3953Fax : 65 6827 3922Mobile: 65 9029 4312Email : laurence at scalablesys.com http://www.scalablesys.com

Page 17: 2003 December

From DGURGUL at PARTNERS.ORG Wed Dec 3 07:24:29 2003From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)Date: Wed, 3 Dec 2003 10:24:29 -0500 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo

cks 3 for Itanium?Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>

Where do we find the SGE roll? Under Lhoste at http://rocks.npaci.edu/Rocks/there is a "Grid" roll listed. Is SGE in that? The userguide doesn't mentionSGE.

Dennis J. GurgulPartners Health Care SystemResearch ManagementResearch Computing Core617.724.3169

-----Original Message-----From: npaci-rocks-discussion-admin at sdsc.edu[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Laurence LiewSent: Tuesday, December 02, 2003 10:10 PMTo: Nai Hong Hwa FrancisCc: npaci-rocks-discussion at sdsc.eduSubject: RE: [Rocks-Discuss]RE: When will Sun Grid Engine be includedinRocks 3 for Itanium?

Hi,

SGE is in the SGE roll.

You need to download the base, hpc and sge roll.

The install is now different from V2.3.x

Cheers!laurence

On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote:> Hi Laurence,> > I just downloaded the Rocks3.0 for IA32 and installed it but SGE is> still not working.> > Any idea?> > Nai Hong Hwa Francis> Institute of Molecular and Cell Biology (A*STAR)> 30 Medical Drive> Singapore 117609.> DID: (65) 6874-6196> > -----Original Message-----> From: Laurence Liew [mailto:laurence at scalablesys.com] > Sent: Thursday, November 20, 2003 2:53 PM

Page 18: 2003 December

> To: Nai Hong Hwa Francis> Cc: npaci-rocks-discussion at sdsc.edu> Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included> inRocks 3 for Itanium?> > Hi Francis> > GridEngine roll is ready for ia32. We will get a ia64 native version> ready as soon as we get back from SC2003. It will be released in a few> weeks time.> > Globus GT2.4 is included in the Grid Roll> > Cheers!> Laurence> > > On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:> > > > Hi,> > > > Does anyone have any idea when will Sun Grid Engine be included as> part> > of Rocks 3 distribution.> > > > I am a newbie to Grid Computing.> > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?> > > > Regards> > > > Nai Hong Hwa Francis> > > > Institute of Molecular and Cell Biology (A*STAR)> > 30 Medical Drive> > Singapore 117609> > DID: 65-6874-6196> > > > -----Original Message-----> > From: npaci-rocks-discussion-request at sdsc.edu> > [mailto:npaci-rocks-discussion-request at sdsc.edu] > > Sent: Thursday, November 20, 2003 4:01 AM> > To: npaci-rocks-discussion at sdsc.edu> > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs> > > > Send npaci-rocks-discussion mailing list submissions to> > npaci-rocks-discussion at sdsc.edu> > > > To subscribe or unsubscribe via the World Wide Web, visit> > > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion> > or, via email, send a message with subject or body 'help' to> > npaci-rocks-discussion-request at sdsc.edu> > > > You can reach the person managing the list at> > npaci-rocks-discussion-admin at sdsc.edu> > > > When replying, please edit your Subject line so it is more specific> > than "Re: Contents of npaci-rocks-discussion digest..."> >

Page 19: 2003 December

> > > > Today's Topics:> > > > 1. top500 cluster installation movie (Greg Bruno)> > 2. Re: Running Normal Application on Rocks Cluster -> > Newbie Question (Laurence Liew)> > > > --__--__--> > > > Message: 1> > To: npaci-rocks-discussion at sdsc.edu> > From: Greg Bruno <bruno at rocksclusters.org>> > Date: Tue, 18 Nov 2003 13:41:15 -0800> > Subject: [Rocks-Discuss]top500 cluster installation movie> > > > here's a crew of 7, installing the 201st fastest supercomputer in the > > world in under two hours on the showroom floor at SC 03:> > > > http://www.rocksclusters.org/rocks.mov> > > > warning: the above file is ~65MB.> > > > - gb> > > > > > --__--__--> > > > Message: 2> > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks> Cluster> > -> > Newbie Question> > From: Laurence Liew <laurenceliew at yahoo.com.sg>> > To: Leong Chee Shian <chee-shian.leong at schenker.com>> > Cc: npaci-rocks-discussion at sdsc.edu> > Date: Wed, 19 Nov 2003 12:31:18 +0800> > > > Chee Shian,> > > > Thanks for your call. We will take this off list and visit you next> week> > in your office as you requested.> > > > Cheers!> > laurence> > > > > > > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:> > > I have just installed Rocks 3.0 with one frontend and two compute> > > node. > > > > > > A normal file based application is installed on the frontend and is> > > NFS shared to the compute nodes . > > > > > > Question is : When run 5 sessions of my applications , the CPU> > > utilization is all concentrated on the frontend node , nothing is> > > being passed on to the compute nodes . How do I make these 3> computers

Page 20: 2003 December

> > > to function as one and share the load ?> > > > > > Thanks everyone as I am really new to this clustering stuff..> > > > > > PS : The idea of exploring rocks cluster is to use a few inexpensive> > > intel machines to replace our existing multi CPU sun server,> > > suggestions and recommendations are greatly appreciated.> > > > > > > > > Leong> > > > > > > > > > > > > > > > > --__--__--> > > > _______________________________________________> > npaci-rocks-discussion mailing list> > npaci-rocks-discussion at sdsc.edu> > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion> > > > > > End of npaci-rocks-discussion Digest> > > > > > DISCLAIMER:> > This email is confidential and may be privileged. If you are not the> intended recipient, please delete it and notify us immediately. Please> do not copy or use it for any purpose, or disclose its contents to any> other person as it may be an offence under the Official Secrets Act.> Thank you.-- Laurence LiewCTO, Scalable Systems Pte Ltd7 Bedok South RoadSingapore 469272Tel : 65 6827 3953Fax : 65 6827 3922Mobile: 65 9029 4312Email : laurence at scalablesys.com http://www.scalablesys.com

From bruno at rocksclusters.org Wed Dec 3 07:32:14 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 3 Dec 2003 07:32:14 -0800Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo cks 3 for Itanium?In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>References: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>Message-ID: <[email protected]>

> Where do we find the SGE roll? Under Lhoste at > http://rocks.npaci.edu/Rocks/> there is a "Grid" roll listed. Is SGE in that? The userguide doesn't > mention> SGE.

Page 21: 2003 December

the SGE roll will be available in the upcoming v3.1.0 release. scheduled release date is december 15th.

- gb

From jlkaiser at fnal.gov Wed Dec 3 08:35:18 2003From: jlkaiser at fnal.gov (Joe Kaiser)Date: Wed, 03 Dec 2003 10:35:18 -0600Subject: [Rocks-Discuss]supermicro based MB'sIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Hi,

You don't say what version of Rocks you are using. The following is forthe X5DPA-GG board and Rocks 3.0. It requires modifying only thepcitable in the boot image on the tftp server. I believe the procedurefor 2.3.2 requires a heck of a lot more work, (but it may not). I wouldhave to dig deep for the notes about the changing 2.3.2.

This should be done on the frontend:

cd /tftpboot/X86PC/UNDI/pxelinux/cp initrd.img initrd.img.origcp initrd.img /tmpcd /tmpmv initrd.img initrd.gzgunzip initrd.gzmkdir /mnt/loopmount -o loop initrd /mnt/loopcd /mnt/loop/modules/vi pcitable

Search for the e1000 drivers and add the following line:

0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit EthernetController"

write the file

cd /tmpumount /mnt/loopgzip initrdmv initrd.gz initrd.imgmv initrd.img /tftpboot/X86PC/UNDI/pxelinux/

Then boot the node.

Hope this helps.

Thanks,

Joe

On Tue, 2003-12-02 at 06:15, Joe Landman wrote:

Page 22: 2003 December

> Folks:> > Working on integrating a Supermicro MB based cluster. Discovered early > on that all of the compute nodes have an Intel based NIC that RedHat > doesn't know anything about (any version of RH). Some of the > administrative nodes have other similar issues. I am seeing simply a > suprising number of mis/un detected hardware across the collection of MBs. > > Anyone have advice on where to get modules/module source for Redhat > for these things? It looks like I will need to rebuild the boot CD, > though the several times I have tried this previously have failed to > produce a working/bootable system. It looks like new modules need to be > created/inserted into the boot process (head node and cluster nodes) > kernels, as well as into the installable kernels.> > Has anyone done this for a Supermicro MB based system? Thanks .> > Joe-- ===================================================================Joe Kaiser - Systems Administrator

Fermi Lab CD/OSS-SCS Never laugh at live dragons.630-840-6444jlkaiser at fnal.gov ===================================================================

From jghobrial at uh.edu Wed Dec 3 08:59:15 2003From: jghobrial at uh.edu (Joseph)Date: Wed, 3 Dec 2003 10:59:15 -0600 (CST)Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Here is the error I receive when I remove the file encoder.pyc and run the command cluster-fork

Traceback (innermost last): File "/opt/rocks/sbin/cluster-fork", line 88, in ? import rocks.pssh File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? import gmon.encoderImportError: No module named encoder

Thanks,Joseph

On Tue, 2 Dec 2003, Mason J. Katz wrote:

> Python creates the .pyc files for you, and does not remove the original

Page 23: 2003 December

> .py file. I would be extremely surprised it two "identical" .pyc files > had the same md5 checksum. I'd expect this to be more like C .o file > which always contain random data to pad out to the end of a page and > 32/64 bit word sizes. Still this is just a guess, the real point is > you can always remove the .pyc files and the .py will regenerate it > when imported (although standard UNIX file/dir permission still apply).> > What is the import error you get from cluster-fork?> > -mjk> > On Dec 2, 2003, at 9:02 AM, Angel Li wrote:> > > Joseph wrote:> >> >> Indeed my md5sum is different for encoder.pyc. However, when I pulled > >> the file and run "cluster-fork" python responds about an import > >> problem. So it seems that regeneration did not occur. Is there a flag > >> I need to pass?> >>> >> I have also tried to figure out what package provides encoder and > >> reinstall the package, but an rpm query reveals nothing.> >>> >> If this is a generated file, what generates it?> >>> >> It seems that an rpm file query on ganglia show that files in the > >> directory belong to the package, but encoder.pyc does not.> >>> >> Thanks,> >> Joseph> >>> >>> >>> > I have finally found the python sources in the HPC rolls CD, filename > > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it > > seems python "compiles" the .py files to ".pyc" and then deletes the > > source file the first time they are referenced? I also noticed that > > there are two versions of python installed. Maybe the pyc files from > > one version won't load into the other one?> >> > Angel> >> >>

From mjk at sdsc.edu Wed Dec 3 15:19:38 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Wed, 3 Dec 2003 15:19:38 -0800Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

This file come from a ganglia package, what does

Page 24: 2003 December

# rpm -q ganglia-receptor

Return?

-mjk

On Dec 3, 2003, at 8:59 AM, Joseph wrote:

> Here is the error I receive when I remove the file encoder.pyc and run > the> command cluster-fork>> Traceback (innermost last):> File "/opt/rocks/sbin/cluster-fork", line 88, in ?> import rocks.pssh> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?> import gmon.encoder> ImportError: No module named encoder>> Thanks,> Joseph>>> On Tue, 2 Dec 2003, Mason J. Katz wrote:>>> Python creates the .pyc files for you, and does not remove the >> original>> .py file. I would be extremely surprised it two "identical" .pyc >> files>> had the same md5 checksum. I'd expect this to be more like C .o file>> which always contain random data to pad out to the end of a page and>> 32/64 bit word sizes. Still this is just a guess, the real point is>> you can always remove the .pyc files and the .py will regenerate it>> when imported (although standard UNIX file/dir permission still >> apply).>>>> What is the import error you get from cluster-fork?>>>> -mjk>>>> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:>>>>> Joseph wrote:>>>>>>> Indeed my md5sum is different for encoder.pyc. However, when I >>>> pulled>>>> the file and run "cluster-fork" python responds about an import>>>> problem. So it seems that regeneration did not occur. Is there a >>>> flag>>>> I need to pass?>>>>>>>> I have also tried to figure out what package provides encoder and>>>> reinstall the package, but an rpm query reveals nothing.>>>>>>>> If this is a generated file, what generates it?>>>>>>>> It seems that an rpm file query on ganglia show that files in the

Page 25: 2003 December

>>>> directory belong to the package, but encoder.pyc does not.>>>>>>>> Thanks,>>>> Joseph>>>>>>>>>>>>>>> I have finally found the python sources in the HPC rolls CD, filename>>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it>>> seems python "compiles" the .py files to ".pyc" and then deletes the>>> source file the first time they are referenced? I also noticed that>>> there are two versions of python installed. Maybe the pyc files from>>> one version won't load into the other one?>>>>>> Angel>>>>>>>>

From csamuel at vpac.org Wed Dec 3 18:09:26 2003From: csamuel at vpac.org (Chris Samuel)Date: Thu, 4 Dec 2003 13:09:26 +1100Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL trademark removal ?Message-ID: <[email protected]>

-----BEGIN PGP SIGNED MESSAGE-----Hash: SHA1

Hi folks,

Can someone confirm that the next Rocks release will support Opteron please ?

Also, I noticed that the current Rocks release on Itanium based on RHEL still has a lot of mentions of RedHat in it, which from my reading of their trademark guidelines is not permitted, is that fixed in the new version ?

cheers!Chris- -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/zpdWO2KABBYQAh8RAqB8AJ9FG+IjIeem21qlFS6XYIHamIMPmwCghVTVAgjAlVHWgdv/KzYQinHGPxs==IAWU-----END PGP SIGNATURE-----

From bruno at rocksclusters.org Wed Dec 3 18:46:30 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 3 Dec 2003 18:46:30 -0800

Page 26: 2003 December

Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL trademark removal ?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> Can someone confirm that the next Rocks release will support Opteron > please ?

yes, it will support opteron.

> Also, I noticed that the current Rocks release on Itanium based on > RHEL still> has a lot of mentions of RedHat in it, which from my reading of their> trademark guidelines is not permitted, is that fixed in the new > version ?

and yes, (even though it doesn't feel like the right thing to do, as redhat has offered to the community some outstanding technologies that we'd like to credit), all redhat trademarks will be removed from 3.1.0.

- gb

From fds at sdsc.edu Thu Dec 4 06:46:32 2003From: fds at sdsc.edu (Federico Sacerdoti)Date: Thu, 4 Dec 2003 06:46:32 -0800Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Please install the http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1 -2.i386.rpm package, which includes the correct encoder.py file. (This package is listed on the 3.0.0 errata page)

-Federico

On Dec 3, 2003, at 8:59 AM, Joseph wrote:

> Here is the error I receive when I remove the file encoder.pyc and run > the> command cluster-fork>> Traceback (innermost last):> File "/opt/rocks/sbin/cluster-fork", line 88, in ?> import rocks.pssh> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?> import gmon.encoder> ImportError: No module named encoder>> Thanks,> Joseph

Page 27: 2003 December

>>> On Tue, 2 Dec 2003, Mason J. Katz wrote:>>> Python creates the .pyc files for you, and does not remove the >> original>> .py file. I would be extremely surprised it two "identical" .pyc >> files>> had the same md5 checksum. I'd expect this to be more like C .o file>> which always contain random data to pad out to the end of a page and>> 32/64 bit word sizes. Still this is just a guess, the real point is>> you can always remove the .pyc files and the .py will regenerate it>> when imported (although standard UNIX file/dir permission still >> apply).>>>> What is the import error you get from cluster-fork?>>>> -mjk>>>> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:>>>>> Joseph wrote:>>>>>>> Indeed my md5sum is different for encoder.pyc. However, when I >>>> pulled>>>> the file and run "cluster-fork" python responds about an import>>>> problem. So it seems that regeneration did not occur. Is there a >>>> flag>>>> I need to pass?>>>>>>>> I have also tried to figure out what package provides encoder and>>>> reinstall the package, but an rpm query reveals nothing.>>>>>>>> If this is a generated file, what generates it?>>>>>>>> It seems that an rpm file query on ganglia show that files in the>>>> directory belong to the package, but encoder.pyc does not.>>>>>>>> Thanks,>>>> Joseph>>>>>>>>>>>>>>> I have finally found the python sources in the HPC rolls CD, filename>>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it>>> seems python "compiles" the .py files to ".pyc" and then deletes the>>> source file the first time they are referenced? I also noticed that>>> there are two versions of python installed. Maybe the pyc files from>>> one version won't load into the other one?>>>>>> Angel>>>>>>>>>>Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

Page 28: 2003 December

From jghobrial at uh.edu Thu Dec 4 07:14:21 2003From: jghobrial at uh.edu (Joseph)Date: Thu, 4 Dec 2003 09:14:21 -0600 (CST)Subject: [Rocks-Discuss]cluster-forkIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Thank you very much this solved the problem.

Joseph

On Thu, 4 Dec 2003, Federico Sacerdoti wrote:

> Please install the > http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1 > -2.i386.rpm package, which includes the correct encoder.py file. (This > package is listed on the 3.0.0 errata page)> > -Federico> > On Dec 3, 2003, at 8:59 AM, Joseph wrote:> > > Here is the error I receive when I remove the file encoder.pyc and run > > the> > command cluster-fork> >> > Traceback (innermost last):> > File "/opt/rocks/sbin/cluster-fork", line 88, in ?> > import rocks.pssh> > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?> > import gmon.encoder> > ImportError: No module named encoder> >> > Thanks,> > Joseph> >> >> > On Tue, 2 Dec 2003, Mason J. Katz wrote:> >> >> Python creates the .pyc files for you, and does not remove the > >> original> >> .py file. I would be extremely surprised it two "identical" .pyc > >> files> >> had the same md5 checksum. I'd expect this to be more like C .o file> >> which always contain random data to pad out to the end of a page and> >> 32/64 bit word sizes. Still this is just a guess, the real point is> >> you can always remove the .pyc files and the .py will regenerate it> >> when imported (although standard UNIX file/dir permission still > >> apply).

Page 29: 2003 December

> >>> >> What is the import error you get from cluster-fork?> >>> >> -mjk> >>> >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:> >>> >>> Joseph wrote:> >>>> >>>> Indeed my md5sum is different for encoder.pyc. However, when I > >>>> pulled> >>>> the file and run "cluster-fork" python responds about an import> >>>> problem. So it seems that regeneration did not occur. Is there a > >>>> flag> >>>> I need to pass?> >>>>> >>>> I have also tried to figure out what package provides encoder and> >>>> reinstall the package, but an rpm query reveals nothing.> >>>>> >>>> If this is a generated file, what generates it?> >>>>> >>>> It seems that an rpm file query on ganglia show that files in the> >>>> directory belong to the package, but encoder.pyc does not.> >>>>> >>>> Thanks,> >>>> Joseph> >>>>> >>>>> >>>>> >>> I have finally found the python sources in the HPC rolls CD, filename> >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it> >>> seems python "compiles" the .py files to ".pyc" and then deletes the> >>> source file the first time they are referenced? I also noticed that> >>> there are two versions of python installed. Maybe the pyc files from> >>> one version won't load into the other one?> >>>> >>> Angel> >>>> >>>> >>> >>> Federico> > Rocks Cluster Group, San Diego Supercomputing Center, CA>

From vrowley at ucsd.edu Thu Dec 4 12:29:55 2003From: vrowley at ucsd.edu (V. Rowley)Date: Thu, 04 Dec 2003 12:29:55 -0800Subject: [Rocks-Discuss]Re: PXE boot problemsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Uh, nevermind. We had upgraded syslinux on our frontend, not the node we were trying to PXE boot. Sigh.

V. Rowley wrote:

Page 30: 2003 December

> We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to > install a compute node via PXE. We are getting an error similar to the > one mentioned in the archives, e.g.> >> Loading initrd.img....>> Ready>>>> Failed to free base memory>>> > We have upgraded to syslinux-2.07-1, per the suggestion in the archives, > but continue to get the same error. Any ideas?>

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China at http://www.sagacitech.com/Chinaweb

From cdwan at mail.ahc.umn.edu Fri Dec 5 08:16:07 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Fri, 5 Dec 2003 10:16:07 -0600 (CST)Subject: [Rocks-Discuss]Private NIS masterMessage-ID: <[email protected]>

Hello all. Long time listener, first time caller. Thanks for all thegreat work.

I'm integrating a Rocks cluster into an existing NIS domain. I noticedthat while the cluster database now supports a PrivateNISMaster, thatvariable doesn't make it into the /etc/yp.conf on the compute nodes. Theyremain broadcast.

Assume that, for whatever reason, I don't want to set up a repeater(slave) ypserv process on my frontend. I added the option "--nisserver<var name="Kickstart_PrivateNISMaster"/>" to the"profiles/3.0.0/nodes/nis-client.xml" file, removed the ypserver on myfrontend, and it works like I want it to.

Am I missing anything fundamental here?

-Chris Dwan University of Minnesota

From wyzhong78 at msn.com Mon Dec 8 06:18:34 2003From: wyzhong78 at msn.com (zhong wenyu)Date: Mon, 08 Dec 2003 22:18:34 +0800Subject: [Rocks-Discuss]3.0.0 problem: not able to boot upMessage-ID: <[email protected]>

Hi,everyone!

Page 31: 2003 December

I installed rocks 3.0.0 defautly, There wasn't any trouble in the installing. But I haven't be able to boot,it stopped at the beginning,the message "GRUB" showed on the screen,and waiting.... my hardware are double Xeon 2.4G,MSI 9138,Seagate SCSI disk. Any appreciate is welcome!

_________________________________________________________________???? MSN Explorer: http://explorer.msn.com/lccn/

From angelini at vki.ac.be Mon Dec 8 06:20:45 2003From: angelini at vki.ac.be (Angelini Giuseppe)Date: Mon, 08 Dec 2003 15:20:45 +0100Subject: [Rocks-Discuss]How to use MPICH with sshMessage-ID: <[email protected]>

Dear rocks folk,

I have recently installed mpich with Lahay Fortran and now that I cancompile and link,I would like to run but it seems that I have another problem. In fact Ihave the followingerror message when I try to run:

[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE$DPT/hybflowp0_13226: p4_error: Path to program is invalid while starting/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:-1 p4_error: latest msg from perror: No such file or directoryp0_13226: p4_error: Child process exited while making connection toremote process on compute-0-6: 0p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32

I am wondering why it is looking for /usr/bin/rsh for the communication,

I expected to use ssh and not rsh.

Any help will be welcome.

Regards.

Giuseppe Angelini

From casuj at cray.com Mon Dec 8 07:31:21 2003From: casuj at cray.com (John Casu)Date: Mon, 8 Dec 2003 07:31:21 -0800Subject: [Rocks-Discuss]How to use MPICH with sshIn-Reply-To: <[email protected]>; from Angelini Giuseppe on Mon, Dec 08, 2003 at 03:20:45PM +0100References: <[email protected]>Message-ID: <[email protected]>

Page 32: 2003 December

On Mon, Dec 08, 2003 at 03:20:45PM +0100, Angelini Giuseppe wrote:> > Dear rocks folk,> > > I have recently installed mpich with Lahay Fortran and now that I can> compile and link,> I would like to run but it seems that I have another problem. In fact I> have the following> error message when I try to run:> > [panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE> $DPT/hybflow> p0_13226: p4_error: Path to program is invalid while starting> /dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:> -1> p4_error: latest msg from perror: No such file or directory> p0_13226: p4_error: Child process exited while making connection to> remote process on compute-0-6: 0> p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32> p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32> > I am wondering why it is looking for /usr/bin/rsh for the communication,> > I expected to use ssh and not rsh.> > Any help will be welcome.>

build mpich thus:

RSHCOMMAND=ssh ./configure .....

> > Regards.> > > Giuseppe Angelini

-- "Roses are red, Violets are blue, You lookin' at me ? YOU LOOKIN' AT ME ?!" -- Get Fuzzy.=======================================================================John CasuCray Inc. casuj at cray.com411 First Avenue South, Suite 600 Tel: (206) 701-2173Seattle, WA 98104-2860 Fax: (206) 701-2500=======================================================================

From davidow at molbio.mgh.harvard.edu Mon Dec 8 08:12:53 2003From: davidow at molbio.mgh.harvard.edu (Lance Davidow)Date: Mon, 8 Dec 2003 11:12:53 -0500Subject: [Rocks-Discuss]How to use MPICH with sshIn-Reply-To: <[email protected]>

Page 33: 2003 December

References: <[email protected]>Message-ID: <p06002001bbfa51fea005@[132.183.190.222]>

Giuseppe,

Here's an answer from a newbie who just faced the same problem.

You are using the wrong flavor of mpich (and mpirun). There are several different distributions which work differently in ROCKS. the one you are using in the default path expects serv_p4 demons and .rhosts files in your home directory. The different flavors may be more compatible with different compilers as well.

[lance at rescluster2 lance]$ which mpirun/opt/mpich-mpd/gnu/bin/mpirun

the one you probably want is/opt/mpich/gnu/bin/mpirun

[lance at rescluster2 lance]$ locate mpirun.../opt/mpich-mpd/gnu/bin/mpirun.../opt/mpich/myrinet/gnu/bin/mpirun.../opt/mpich/gnu/bin/mpirun

Cheers,Lance

At 3:20 PM +0100 12/8/03, Angelini Giuseppe wrote:>Dear rocks folk,>>>I have recently installed mpich with Lahay Fortran and now that I can>compile and link,>I would like to run but it seems that I have another problem. In fact I>have the following>error message when I try to run:>>[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE>$DPT/hybflow>p0_13226: p4_error: Path to program is invalid while starting>/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:>-1> p4_error: latest msg from perror: No such file or directory>p0_13226: p4_error: Child process exited while making connection to>remote process on compute-0-6: 0>p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32>p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32>>I am wondering why it is looking for /usr/bin/rsh for the communication,>>I expected to use ssh and not rsh.>>Any help will be welcome.>>

Page 34: 2003 December

>Regards.>>Giuseppe Angelini

-- Lance Davidow, PhDDirector of BioinformaticsDept of Molecular BiologyMass General HospitalBoston MA 02114davidow at molbio.mgh.harvard.edu617.726-5955Fax: 617.726-6893

From rscarce at caci.com Fri Dec 5 16:43:00 2003From: rscarce at caci.com (Reed Scarce)Date: Fri, 5 Dec 2003 19:43:00 -0500Subject: [Rocks-Discuss]PXE and system imagesMessage-ID: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com>

We want to initialize new hardware with a known good image from identical hardware currently in use. The process imagined would be to PXE boot to a disk image server, PXE would create a RAM system that would request the system disk image from the server, which would push the desired system disk image to the requesting system. Upon completion the system would be available as a cluster member.

The lab configuration is a PC grade frontend with two 3Com 905s and a single server grade cluster node with integrated Intel 82551 (10/100)(the only PXE interface) and two integrated Intel 82546 (10/100/1000). The cluster node is one of the stock of nodes for the expansion. The stock of nodes have a Linux OS pre-installed, which would be eliminated in the process.

Currently the node will PXE boot from the 10/100 and pickup an installation boot from one of the g-bit interfaces. From there kickstart wants to take over.

Any recommendations how to get kickstart to push an image to the disk?

Thanks,

Reed Scarce-------------- next part --------------An HTML attachment was scrubbed...URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031205/dad04521/attachment-0001.html

From wyzhong78 at msn.com Mon Dec 8 05:36:37 2003From: wyzhong78 at msn.com (zhong wenyu)Date: Mon, 08 Dec 2003 21:36:37 +0800Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot upMessage-ID: <[email protected]>

Hi,everyone!I have installed Rocks 3.0.0 with default options successful,there was not any trouble.But I boot it up,it stopped at beginning,just show "GRUB" on

Page 35: 2003 December

the screen and waiting...Thanks for your help!

_________________________________________________________________???? MSN Explorer: http://explorer.msn.com/lccn/

From daniel.kidger at quadrics.com Mon Dec 8 09:54:53 2003From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)Date: Mon, 8 Dec 2003 17:54:53 -0000Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)Message-ID: <[email protected]>

Dear all, Previously I have been installing a custom kernel on the compute nodes with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf).

However I am now trying to do it the 'proper' way. So I do (on :# cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm \ /home/install/rocks-dist/7.3/en/os/i386/force/RPMS# cd /home/install# rocks-dist dist# SSH_NO_PASSWD=1 shoot-node compute-0-0

Hence:# find /home/install/ |xargs -l grep -nH qsnetshows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate my rpm in that directory rocks-dist notices this and warns me.)

However the node always ends up with "2.4.20-20.7smp" again.anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel-smp-2.4.20-20.7."

So my question is: It looks like my RPM has a name that Rocks doesn't understand properly. What is wrong with my name ? and what are the rules for getting the correct name ? (.i686.rpm is of course correct, but I don't have -smp. in the name Is this the problem ?)

cf. Greg Bruno's wisdom: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html

Yours,Daniel.

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505----------------------- www.quadrics.com --------------------

>

From DGURGUL at PARTNERS.ORG Mon Dec 8 11:09:27 2003From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)Date: Mon, 8 Dec 2003 14:09:27 -0500

Page 36: 2003 December

Subject: [Rocks-Discuss]cluster-fork --mpd strangenessMessage-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15840@phsexch7.mgh.harvard.edu>

I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm" andthen "cluster-fork service gschedule restart" (not sure I had to do the last).I also put 3.0.1-2 and restarted gschedule on the frontend.

Now I run "cluster-fork --mpd w".

I currently have a user who ssh'd to compute-0-8 from the frontend and one whossh'd into compute-0-17 from the front end.

But the return shows the users on lines for 17 (for the user on 0-8) and 10 (forthe user on 0-17):

17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.0317: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash

10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.0710: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash

When I do "cluster-fork w" (without the --mpd) the users show up on the correctnodes.

Do the numbers on the left of the -mpd output correspond to the node names?

Thanks.

Dennis

Dennis J. GurgulPartners Health Care SystemResearch ManagementResearch Computing Core617.724.3169

From DGURGUL at PARTNERS.ORG Mon Dec 8 11:28:30 2003From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)Date: Mon, 8 Dec 2003 14:28:30 -0500 Subject: [Rocks-Discuss]cluster-fork --mpd strangenessMessage-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>

Maybe this is a better description of the "strangeness".

I did "cluster-fork --mpd hostname":

1: compute-0-0.local2: compute-0-1.local3: compute-0-3.local4: compute-0-13.local5: compute-0-11.local6: compute-0-15.local7: compute-0-16.local8: compute-0-19.local9: compute-0-21.local

Page 37: 2003 December

10: compute-0-17.local11: compute-0-5.local12: compute-0-20.local13: compute-0-18.local14: compute-0-12.local15: compute-0-9.local16: compute-0-4.local17: compute-0-8.local18: compute-0-14.local19: compute-0-2.local20: compute-0-6.local0: compute-0-7.local21: compute-0-10.local

Dennis J. GurgulPartners Health Care SystemResearch ManagementResearch Computing Core617.724.3169

-----Original Message-----From: npaci-rocks-discussion-admin at sdsc.edu[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,Dennis J.Sent: Monday, December 08, 2003 2:09 PMTo: npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]cluster-fork --mpd strangeness

I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm"andthen "cluster-fork service gschedule restart" (not sure I had to do thelast).I also put 3.0.1-2 and restarted gschedule on the frontend.

Now I run "cluster-fork --mpd w".

I currently have a user who ssh'd to compute-0-8 from the frontend and onewhossh'd into compute-0-17 from the front end.

But the return shows the users on lines for 17 (for the user on 0-8) and 10(forthe user on 0-17):

17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.0317: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash

10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.0710: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash

When I do "cluster-fork w" (without the --mpd) the users show up on thecorrectnodes.

Do the numbers on the left of the -mpd output correspond to the node names?

Page 38: 2003 December

Thanks.

Dennis

Dennis J. GurgulPartners Health Care SystemResearch ManagementResearch Computing Core617.724.3169

From tim.carlson at pnl.gov Mon Dec 8 12:35:16 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Mon, 08 Dec 2003 12:35:16 -0800 (PST)Subject: [Rocks-Discuss]PXE and system imagesIn-Reply-To: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com>Message-ID: <[email protected]>

On Fri, 5 Dec 2003, Reed Scarce wrote:

> We want to initialize new hardware with a known good image from identical> hardware currently in use. The process imagined would be to PXE boot to a> disk image server, PXE would create a RAM system that would request the> system disk image from the server, which would push the desired system> disk image to the requesting system. Upon completion the system would be> available as a cluster member.>> The lab configuration is a PC grade frontend with two 3Com 905s and a> single server grade cluster node with integrated Intel 82551 (10/100)(the> only PXE interface) and two integrated Intel 82546 (10/100/1000). The> cluster node is one of the stock of nodes for the expansion. The stock of> nodes have a Linux OS pre-installed, which would be eliminated in the> process.>> Currently the node will PXE boot from the 10/100 and pickup an> installation boot from one of the g-bit interfaces. From there kickstart> wants to take over.>> Any recommendations how to get kickstart to push an image to the disk?

This sounds like you want to use Oscar instead of ROCKS.

http://oscar.openclustergroup.org/tiki-index.php

I'm not exactly sure why you think that the kickstart process won't giveyou exactly the same image on ever machine. If the hardware is the same,you'll get the same image on each machine.

We have boxes with the same setup, 10/100 PXE, and then dual gigabit. Ourmethod for installing ROCKS on this type of hardware is the following

1) Run insert-ethers and choose "manager" type of node.2) Connect all the PXE interfaces to the switch and boot them all. Do not connect the gigabit interface3) Once all of the nodes have PXE booted, exit insert-ethers. Start insert-ethers again and this time choose compute node4) Hook up the gigabit interface and the PXE interface to your nodes. All

Page 39: 2003 December

of your machines will now install.5) In our case, we now quickly disconnect the PXE interface because we don't want to have the machine continually install. The real ROCKS method would have you choose (HD/net) for booting in the BIOS, but if you already have an OS on your machine, you would have to go into the BIOS twice before the compute nodes were installed. We disable rocks-grub and just connect up the PXE cable if we need to reinstall.

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From tim.carlson at pnl.gov Mon Dec 8 12:42:23 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Mon, 08 Dec 2003 12:42:23 -0800 (PST)Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)In-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Mon, 8 Dec 2003 daniel.kidger at quadrics.com wrote:

I've gotten confused from time to time as to where to place custom RPMS(it's changed between releases), so my not-so-clean method is to just ripout the kernels in /home/install/rocks-dist/7.3/en/os/i386/Redhat/RPMSand drop my own in. Then do a

cd /home/installrocks-dist distshoot-node

You are probably running into an issue where the "force" directory is moreof an "in addition to" directory and your 2.4.18 kernel is being noted,but ignored since the 2.4.20 kernel is newer. I assume you nodes get bothand SMP and UP version of 2.4.20 and that your custom 2.4.18 is nowhere tobe found on the compute node.

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

> Previously I have been installing a custom kernel on the compute nodes> with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf).>> However I am now trying to do it the 'proper' way. So I do (on :> # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm \> /home/install/rocks-dist/7.3/en/os/i386/force/RPMS> # cd /home/install> # rocks-dist dist> # SSH_NO_PASSWD=1 shoot-node compute-0-0>> Hence:> # find /home/install/ |xargs -l grep -nH qsnet

Page 40: 2003 December

> shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate my rpm in that directory rocks-dist notices this and warns me.)>> However the node always ends up with "2.4.20-20.7smp" again.> anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel-smp-2.4.20-20.7.">> So my question is:> It looks like my RPM has a name that Rocks doesn't understand properly.> What is wrong with my name ?> and what are the rules for getting the correct name ?> (.i686.rpm is of course correct, but I don't have -smp. in the name Is this the problem ?)>> cf. Greg Bruno's wisdom:> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html>>> Yours,> Daniel.

From fds at sdsc.edu Mon Dec 8 12:51:12 2003From: fds at sdsc.edu (Federico Sacerdoti)Date: Mon, 8 Dec 2003 12:51:12 -0800Subject: [Rocks-Discuss]cluster-fork --mpd strangenessIn-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>Message-ID: <[email protected]>

You are right, and I think this is a shortcoming of MPD. There is no obvious way to force the MPD numbering to correspond to the order the nodes were called out on the command line (cluster-fork --mpd actually makes a shell call to mpirun and it calls out all the node names explicitly). MPD seems to number the output differently, as you found out.

So mpd for now may be more useful for jobs that are not sensitive to this. If enough of you find this shortcoming to be a real annoyance, we could work on putting the node name label on the output by explicitly calling "hostname" or similar.

Good ideas are welcome :)-Federico

On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:

> Maybe this is a better description of the "strangeness".>> I did "cluster-fork --mpd hostname":>> 1: compute-0-0.local> 2: compute-0-1.local> 3: compute-0-3.local> 4: compute-0-13.local> 5: compute-0-11.local> 6: compute-0-15.local> 7: compute-0-16.local

Page 41: 2003 December

> 8: compute-0-19.local> 9: compute-0-21.local> 10: compute-0-17.local> 11: compute-0-5.local> 12: compute-0-20.local> 13: compute-0-18.local> 14: compute-0-12.local> 15: compute-0-9.local> 16: compute-0-4.local> 17: compute-0-8.local> 18: compute-0-14.local> 19: compute-0-2.local> 20: compute-0-6.local> 0: compute-0-7.local> 21: compute-0-10.local>> Dennis J. Gurgul> Partners Health Care System> Research Management> Research Computing Core> 617.724.3169>>> -----Original Message-----> From: npaci-rocks-discussion-admin at sdsc.edu> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,> Dennis J.> Sent: Monday, December 08, 2003 2:09 PM> To: npaci-rocks-discussion at sdsc.edu> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness>>> I just did "cluster-fork -Uvh > /sourcedir/ganglia-python-3.0.1-2.i386.rpm"> and> then "cluster-fork service gschedule restart" (not sure I had to do the> last).> I also put 3.0.1-2 and restarted gschedule on the frontend.>> Now I run "cluster-fork --mpd w".>> I currently have a user who ssh'd to compute-0-8 from the frontend and > one> who> ssh'd into compute-0-17 from the front end.>> But the return shows the users on lines for 17 (for the user on 0-8) > and 10> (for> the user on 0-17):>> 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, > 0.03> 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT> 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s > -bash>> 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,

Page 42: 2003 December

> 0.07> 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT> 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s > -bash>> When I do "cluster-fork w" (without the --mpd) the users show up on the> correct> nodes.>> Do the numbers on the left of the -mpd output correspond to the node > names?>> Thanks.>> Dennis>> Dennis J. Gurgul> Partners Health Care System> Research Management> Research Computing Core> 617.724.3169>Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

From DGURGUL at PARTNERS.ORG Mon Dec 8 12:55:13 2003From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)Date: Mon, 8 Dec 2003 15:55:13 -0500 Subject: [Rocks-Discuss]cluster-fork --mpd strangenessMessage-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>

Thanks.

On a related note, when I did "cluster-fork service gschedule restart" gschedulestarted with the "OK" output, but then the fork process hung on each node and Ihad to ^c out for it to go on to the next node.

I tried to ssh to a node and then did the gschedule restart. Even then, after Itried to "exit" out of the node, the session hung and I had to log back in andkill it from the frontend.

Dennis J. GurgulPartners Health Care SystemResearch ManagementResearch Computing Core617.724.3169

-----Original Message-----From: Federico Sacerdoti [mailto:fds at sdsc.edu]Sent: Monday, December 08, 2003 3:51 PMTo: Gurgul, Dennis J.Cc: npaci-rocks-discussion at sdsc.eduSubject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness

Page 43: 2003 December

You are right, and I think this is a shortcoming of MPD. There is no obvious way to force the MPD numbering to correspond to the order the nodes were called out on the command line (cluster-fork --mpd actually makes a shell call to mpirun and it calls out all the node names explicitly). MPD seems to number the output differently, as you found out.

So mpd for now may be more useful for jobs that are not sensitive to this. If enough of you find this shortcoming to be a real annoyance, we could work on putting the node name label on the output by explicitly calling "hostname" or similar.

Good ideas are welcome :)-Federico

On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:

> Maybe this is a better description of the "strangeness".>> I did "cluster-fork --mpd hostname":>> 1: compute-0-0.local> 2: compute-0-1.local> 3: compute-0-3.local> 4: compute-0-13.local> 5: compute-0-11.local> 6: compute-0-15.local> 7: compute-0-16.local> 8: compute-0-19.local> 9: compute-0-21.local> 10: compute-0-17.local> 11: compute-0-5.local> 12: compute-0-20.local> 13: compute-0-18.local> 14: compute-0-12.local> 15: compute-0-9.local> 16: compute-0-4.local> 17: compute-0-8.local> 18: compute-0-14.local> 19: compute-0-2.local> 20: compute-0-6.local> 0: compute-0-7.local> 21: compute-0-10.local>> Dennis J. Gurgul> Partners Health Care System> Research Management> Research Computing Core> 617.724.3169>>> -----Original Message-----> From: npaci-rocks-discussion-admin at sdsc.edu> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,> Dennis J.> Sent: Monday, December 08, 2003 2:09 PM> To: npaci-rocks-discussion at sdsc.edu

Page 44: 2003 December

> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness>>> I just did "cluster-fork -Uvh > /sourcedir/ganglia-python-3.0.1-2.i386.rpm"> and> then "cluster-fork service gschedule restart" (not sure I had to do the> last).> I also put 3.0.1-2 and restarted gschedule on the frontend.>> Now I run "cluster-fork --mpd w".>> I currently have a user who ssh'd to compute-0-8 from the frontend and > one> who> ssh'd into compute-0-17 from the front end.>> But the return shows the users on lines for 17 (for the user on 0-8) > and 10> (for> the user on 0-17):>> 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, > 0.03> 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT> 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s > -bash>> 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, > 0.07> 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT> 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s > -bash>> When I do "cluster-fork w" (without the --mpd) the users show up on the> correct> nodes.>> Do the numbers on the left of the -mpd output correspond to the node > names?>> Thanks.>> Dennis>> Dennis J. Gurgul> Partners Health Care System> Research Management> Research Computing Core> 617.724.3169>Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

From mjk at sdsc.edu Mon Dec 8 12:58:22 2003

Page 45: 2003 December

From: mjk at sdsc.edu (Mason J. Katz)Date: Mon, 8 Dec 2003 12:58:22 -0800Subject: [Rocks-Discuss]PXE and system imagesIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

On Dec 8, 2003, at 12:35 PM, Tim Carlson wrote:

> 5) In our case, we now quickly disconnect the PXE interface because we> don't want to have the machine continually install. The real ROCKS> method would have you choose (HD/net) for booting in the BIOS, but > if you already> have an OS on your machine, you would have to go into the BIOS twice> before the compute nodes were installed. We disable rocks-grub and > just> connect up the PXE cable if we need to reinstall.>

For most boxes we've seen that support PXE there is an option to hit <F12> to force a network PXE boot, this allows you to force a PXE even when a valid OS/Boot block exists on your hard disk. If you don't have this you do indeed need to go into BIOS twice -- a pain.

-mjk

From fds at sdsc.edu Mon Dec 8 13:26:46 2003From: fds at sdsc.edu (Federico Sacerdoti)Date: Mon, 8 Dec 2003 13:26:46 -0800Subject: [Rocks-Discuss]cluster-fork --mpd strangenessIn-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>Message-ID: <[email protected]>

I've seen this before as well. I believe it has something to do with the way the color "[ OK ]" characters are interacting with the ssh session from the normal cluster-fork. We have yet to characterize this bug adequately.

-Federico

On Dec 8, 2003, at 12:55 PM, Gurgul, Dennis J. wrote:

> Thanks.>> On a related note, when I did "cluster-fork service gschedule restart" > gschedule> started with the "OK" output, but then the fork process hung on each > node and I> had to ^c out for it to go on to the next node.>> I tried to ssh to a node and then did the gschedule restart. Even > then, after I> tried to "exit" out of the node, the session hung and I had to log > back in and> kill it from the frontend.

Page 46: 2003 December

>>> Dennis J. Gurgul> Partners Health Care System> Research Management> Research Computing Core> 617.724.3169>>> -----Original Message-----> From: Federico Sacerdoti [mailto:fds at sdsc.edu]> Sent: Monday, December 08, 2003 3:51 PM> To: Gurgul, Dennis J.> Cc: npaci-rocks-discussion at sdsc.edu> Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness>>> You are right, and I think this is a shortcoming of MPD. There is no> obvious way to force the MPD numbering to correspond to the order the> nodes were called out on the command line (cluster-fork --mpd actually> makes a shell call to mpirun and it calls out all the node names> explicitly). MPD seems to number the output differently, as you found> out.>> So mpd for now may be more useful for jobs that are not sensitive to> this. If enough of you find this shortcoming to be a real annoyance, we> could work on putting the node name label on the output by explicitly> calling "hostname" or similar.>> Good ideas are welcome :)> -Federico>> On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:>>> Maybe this is a better description of the "strangeness".>>>> I did "cluster-fork --mpd hostname":>>>> 1: compute-0-0.local>> 2: compute-0-1.local>> 3: compute-0-3.local>> 4: compute-0-13.local>> 5: compute-0-11.local>> 6: compute-0-15.local>> 7: compute-0-16.local>> 8: compute-0-19.local>> 9: compute-0-21.local>> 10: compute-0-17.local>> 11: compute-0-5.local>> 12: compute-0-20.local>> 13: compute-0-18.local>> 14: compute-0-12.local>> 15: compute-0-9.local>> 16: compute-0-4.local>> 17: compute-0-8.local>> 18: compute-0-14.local>> 19: compute-0-2.local>> 20: compute-0-6.local>> 0: compute-0-7.local

Page 47: 2003 December

>> 21: compute-0-10.local>>>> Dennis J. Gurgul>> Partners Health Care System>> Research Management>> Research Computing Core>> 617.724.3169>>>>>> -----Original Message----->> From: npaci-rocks-discussion-admin at sdsc.edu>> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,>> Dennis J.>> Sent: Monday, December 08, 2003 2:09 PM>> To: npaci-rocks-discussion at sdsc.edu>> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness>>>>>> I just did "cluster-fork -Uvh>> /sourcedir/ganglia-python-3.0.1-2.i386.rpm">> and>> then "cluster-fork service gschedule restart" (not sure I had to do >> the>> last).>> I also put 3.0.1-2 and restarted gschedule on the frontend.>>>> Now I run "cluster-fork --mpd w".>>>> I currently have a user who ssh'd to compute-0-8 from the frontend and>> one>> who>> ssh'd into compute-0-17 from the front end.>>>> But the return shows the users on lines for 17 (for the user on 0-8)>> and 10>> (for>> the user on 0-17):>>>> 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,>> 0.03>> 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU>> WHAT>> 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s>> -bash>>>> 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,>> 0.07>> 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU>> WHAT>> 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s>> -bash>>>> When I do "cluster-fork w" (without the --mpd) the users show up on >> the>> correct>> nodes.>>>> Do the numbers on the left of the -mpd output correspond to the node>> names?

Page 48: 2003 December

>>>> Thanks.>>>> Dennis>>>> Dennis J. Gurgul>> Partners Health Care System>> Research Management>> Research Computing Core>> 617.724.3169>>> Federico>> Rocks Cluster Group, San Diego Supercomputing Center, CA>Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

From bruno at rocksclusters.org Mon Dec 8 15:31:08 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 8 Dec 2003 15:31:08 -0800Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot upIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> I have installed Rocks 3.0.0 with default options successful,there was > not any trouble.But I boot it up,it stopped at beginning,just show > "GRUB" on the screen and waiting...

when you built the frontend, did you start with the rocks base CD then add the HPC roll?

- gb

From bruno at rocksclusters.org Mon Dec 8 15:37:46 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 8 Dec 2003 15:37:46 -0800Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> Previously I have been installing a custom kernel on the compute > nodes> with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix > grub.conf).>> However I am now trying to do it the 'proper' way. So I do (on :> # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm \> /home/install/rocks-dist/7.3/en/os/i386/force/RPMS> # cd /home/install> # rocks-dist dist> # SSH_NO_PASSWD=1 shoot-node compute-0-0

Page 49: 2003 December

>> Hence:> # find /home/install/ |xargs -l grep -nH qsnet> shows me that hdlist and hdlist2 now contain this RPM. (and indeed If > I duplicate my rpm in that directory rocks-dist notices this and warns > me.)>> However the node always ends up with "2.4.20-20.7smp" again.> anaconda-ks.cfg contains just "kernel-smp" and install.log has > "Installing kernel-smp-2.4.20-20.7.">> So my question is:> It looks like my RPM has a name that Rocks doesn't understand > properly.> What is wrong with my name ?> and what are the rules for getting the correct name ?> (.i686.rpm is of course correct, but I don't have -smp. in the > name Is this the problem ?)

the anaconda installer looks for kernel packages with a specific format:

kernel-<kernel ver>-<redhat ver>.i686.rpm

and for smp nodes:

kernel-smp-<kernel ver>-<redhat ver>.i686.rpm

we have made the necessary patches to files under /usr/src/linux-2.4 in order to produce redhat-compliant kernels. see:

http://www.rocksclusters.org/rocks-documentation/3.0.0/customization- kernel.html

also, would you be interested in making your changes for the quadrics interconnect available to the general rocks community?

- gb

From purikk at hotmail.com Mon Dec 8 20:23:35 2003From: purikk at hotmail.com (purushotham komaravolu)Date: Mon, 8 Dec 2003 23:23:35 -0500Subject: [Rocks-Discuss]AMD OpteronReferences: <[email protected]>Message-ID: <[email protected]>

Hello, I am a newbie to ROCKS cluster. I wanted to setup clusters on32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel andAMD).I found the 64-bit download for Intel on the website but not for AMD. Doesit work for AMD opteron? if not what is the ETA for AMD-64.We are planning to but AMD-64 bit machines shortly, and I would like tovolunteer for the beta testing if needed.ThanksRegards,Puru

Page 50: 2003 December

From mjk at sdsc.edu Tue Dec 9 07:28:51 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 9 Dec 2003 07:28:51 -0800Subject: [Rocks-Discuss]AMD OpteronIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

We have a beta right now that we have sent to a few people. We plan on a release this month, and AMD_64 will be part of this release along with the usual x86, IA64 support.

If you want to help accelerate this process please talk to your vendor about loaning/giving us some hardware for testing. Having access to a variety of Opteron hardware (we own two boxes) is the only way we can have good support for this chip.

-mjk

On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:

> Hello,> I am a newbie to ROCKS cluster. I wanted to setup clusters > on> 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel > and> AMD).> I found the 64-bit download for Intel on the website but not for AMD. > Does> it work for AMD opteron? if not what is the ETA for AMD-64.> We are planning to but AMD-64 bit machines shortly, and I would like to> volunteer for the beta testing if needed.> Thanks> Regards,> Puru

From cdmaest at sandia.gov Tue Dec 9 07:48:31 2003From: cdmaest at sandia.gov (Christopher D. Maestas)Date: Tue, 09 Dec 2003 08:48:31 -0700Subject: [Rocks-Discuss]AMD OpteronIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

What do I have to do to sign up to test? We have opteron systems we cantest on here.

On Tue, 2003-12-09 at 08:28, Mason J. Katz wrote:> We have a beta right now that we have sent to a few people. We plan on > a release this month, and AMD_64 will be part of this release along > with the usual x86, IA64 support.>

Page 51: 2003 December

> If you want to help accelerate this process please talk to your vendor > about loaning/giving us some hardware for testing. Having access to a > variety of Opteron hardware (we own two boxes) is the only way we can > have good support for this chip.> > -mjk> > > On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:> > > Hello,> > I am a newbie to ROCKS cluster. I wanted to setup clusters > > on> > 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel > > and> > AMD).> > I found the 64-bit download for Intel on the website but not for AMD. > > Does> > it work for AMD opteron? if not what is the ETA for AMD-64.> > We are planning to but AMD-64 bit machines shortly, and I would like to> > volunteer for the beta testing if needed.> > Thanks> > Regards,> > Puru>

From vincent_b_fox at yahoo.com Tue Dec 9 11:10:40 2003From: vincent_b_fox at yahoo.com (Vincent Fox)Date: Tue, 9 Dec 2003 11:10:40 -0800 (PST)Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platformMessage-ID: <[email protected]>

I tried doing a rebuild of the ATLAS libraries on aPII test cluster and no go. Did an exportPATH=/opt/gcc32/bin:$PATH first to make it easy onmyself.

The "make rpm" appears to get stuck in a loop on thexconfig part. I pause it and it seems like the promptis defining f77=-O and f77 FLAGS=y which doesn't workof course. My guess is the spec file doesn't have ananswer for a previous question, so the /usr/bin/g77answer is getting set for the previous prompt, andsince no f77 is defined, it gets stuck.

Anyhow thought I would note this problem on the listfor those more qualified to address it.

__________________________________Do you Yahoo!?New Yahoo! Photos - easier uploading and sharing.http://photos.yahoo.com/

From bryan at UCLAlumni.net Tue Dec 9 12:14:16 2003

Page 52: 2003 December

From: bryan at UCLAlumni.net (Bryan Littlefield)Date: Tue, 09 Dec 2003 12:14:16 -0800Subject: [Rocks-Discuss]Rocks-Discuss] AMD Opteron - Contact ApproIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Hi Mason,

I suggest contacting Appro. We are using Rocks on our Opteron cluster and Appro would likely love to help. I will contact them as well to see if they could help getting a opteron machine for testing. Contact info below:

Thanks --Bryan

Jian Chang - Regional Sales Manager(408) 941-8100 x 202(800) 927-5464 x 202(408) 941-8111 Faxjian at appro.comhttp://www.appro.com

npaci-rocks-discussion-request at sdsc.edu wrote:

>From: "Mason J. Katz" <mjk at sdsc.edu>>Subject: Re: [Rocks-Discuss]AMD Opteron>Date: Tue, 9 Dec 2003 07:28:51 -0800>To: "purushotham komaravolu" <purikk at hotmail.com>>>We have a beta right now that we have sent to a few people. We plan on >a release this month, and AMD_64 will be part of this release along >with the usual x86, IA64 support.>>If you want to help accelerate this process please talk to your vendor >about loaning/giving us some hardware for testing. Having access to a >variety of Opteron hardware (we own two boxes) is the only way we can >have good support for this chip.>> -mjk>>>On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:>> >> Cc: <npaci-rocks-discussion at sdsc.edu>>>>Hello,>> I am a newbie to ROCKS cluster. I wanted to setup clusters >>on>>32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel >>and>>AMD).>>I found the 64-bit download for Intel on the website but not for AMD. >>Does>>it work for AMD opteron? if not what is the ETA for AMD-64.>>We are planning to but AMD-64 bit machines shortly, and I would like to>>volunteer for the beta testing if needed.

Page 53: 2003 December

>>Thanks>>Regards,>>Puru>> >>>>_______________________________________________>npaci-rocks-discussion mailing list>npaci-rocks-discussion at sdsc.edu>http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion>>>End of npaci-rocks-discussion Digest> >-------------- next part --------------An HTML attachment was scrubbed...URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031209/611e65b4/attachment-0001.html

From vincent_b_fox at yahoo.com Tue Dec 9 13:22:59 2003From: vincent_b_fox at yahoo.com (Vincent Fox)Date: Tue, 9 Dec 2003 13:22:59 -0800 (PST)Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platformMessage-ID: <[email protected]>

Okay, came up my own quick hack:

Edit atlas.spec.in, go to "other x86" section, remove2 lines right above "linux", seems to make rpm now.

A more formal patch would be put in a section forcpuid eq 4 with this correction I suppose.

__________________________________Do you Yahoo!?New Yahoo! Photos - easier uploading and sharing.http://photos.yahoo.com/

From landman at scalableinformatics.com Tue Dec 9 13:49:06 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 09 Dec 2003 16:49:06 -0500Subject: [Rocks-Discuss]Has anyone tried Gaussian binary only on the ROCKS 3.1.0 beta?Message-ID: <[email protected]>

Hi Folks

Working on building the same cluster from last week. The admin nodesare up and functional (plain old RH9+XFS).

I want to get the head nodes up, with one of the requirements beingrunning the Gaussian binary-only code. Gaussian's page lists RH9.0support, so I wanted to see if someone has tried the beta with thiscode.

Thanks.

Page 54: 2003 December

Joe

-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

From landman at scalableinformatics.com Tue Dec 9 13:59:37 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 09 Dec 2003 16:59:37 -0500Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...Message-ID: <[email protected]>

Folks:

As indicated previously, I am wrestling with a Supermicro basedcluster. None of the RH distributions come with the correct E1000driver, so a new kernel is needed (in the boot CD, and forinstallation).

The problem I am running into is that it isn't at all obvious/easy howto install a new kernel/modules into ROCKS (3.0 or otherwise) to enablethis thing to work. Following the examples in the documentation havenot met with success. Running "rocks-dist cdrom" with the new kernels(2.4.23 works nicely on the nodes) in the force/RPMS directory generatesa bootable CD with the original 2.4.18BOOT kernel.

What I (and I think others) need, is a simple/easy to follow methodthat will generate a bootable CD with the correct linux kernel, and thecorrect modules.

Is this in process somewhere? What would be tremendously helpful isif we can generate a binary module, and put that into the boot processby placing it into the force/modules/binary directory (assuming oneexists) with the appropriate entry of a similar name in theforce/modules/meta directory as a simple XML document giving pci-ids,description, name, etc.

Anything close to this coming? Modules are killing future ROCKSinstalls, the inability to easily inject a new module in there hascreated a problem whereby ROCKS does not function (as the underlying RHdoes not function).

-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

Page 55: 2003 December

From tim.carlson at pnl.gov Tue Dec 9 14:11:43 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Tue, 09 Dec 2003 14:11:43 -0800 (PST)Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...In-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Tue, 9 Dec 2003, Joe Landman wrote:

> The problem I am running into is that it isn't at all obvious/easy how> to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable> this thing to work. Following the examples in the documentation have> not met with success. Running "rocks-dist cdrom" with the new kernels> (2.4.23 works nicely on the nodes) in the force/RPMS directory generates> a bootable CD with the original 2.4.18BOOT kernel.

So you built a 2.4.23BOOT rpm? The problem people have is with the namingconvention of kernels. A kernel.org spec file isn't going to generateproper kernel rpms IMHO. What you really want to do (and maybe you arealready doing this) is steal the bit of the Redhat spec building scriptsthat generage the -smp .i686 and BOOT rpms.

New hardware is tough for any distro.

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From tmartin at physics.ucsd.edu Tue Dec 9 15:57:17 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Tue, 09 Dec 2003 15:57:17 -0800Subject: [Rocks-Discuss]Intel MT based Gigabit controllersMessage-ID: <[email protected]>

Does Rocks 3.0 support the Intel MT based Gigabit controllers (PCI 8086:1013) without any modifications? My new cluster has these new controllers.

Rocks 2.3.1 does not seem detect/drive these cards correctly (install failes to detect and the e1000 driver does not seem to work). So I was going to go ahead and move my new head node to 3.0.0 and was wondering if I am going to have to do additional work to get the intel drivers on the boot image (for cluster nodes) to have the working Intel driver with these cards.

Terrence

From tmartin at physics.ucsd.edu Tue Dec 9 15:59:29 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Tue, 09 Dec 2003 15:59:29 -0800Subject: [Rocks-Discuss]how to include custom driverIn-Reply-To: <[email protected]>

Page 56: 2003 December

References: <[email protected]>Message-ID: <[email protected]>

Tim Carlson wrote:> On Mon, 9 Jun 2003, Greg Bruno wrote:> > >>what driver did you have to add?>>>>we may be able to provide a patch for your compute nodes.> > > Ah!!!.. I didn't see this repsonse before I sent off my reply to Matthew.> Can I please have the aic79xx driver and while your at it can I get a> module-info file that has this entry for gigabit? Not sure if it is> already in there? ;)> > 0x8086 0x100f "e1000" "Intel Corp. 82545EM Gigabit Ethernet Controller rev (01)"> > It is also quite possible that I burned the 2.3.0 media instead of> 2.3.2. It was late in the day when I tried to do my install.> > Tim> > Tim Carlson> Voice: (509) 376 3423> Email: Tim.Carlson at pnl.gov> EMSL UNIX System Support

I would also like to request that this driver/change be made. I have a cluster with these newer Intel gigabit chipsets.

Terrence

From tmartin at physics.ucsd.edu Tue Dec 9 16:33:18 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Tue, 09 Dec 2003 16:33:18 -0800Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Tim Carlson wrote:> On Tue, 9 Dec 2003, Joe Landman wrote:> > >> The problem I am running into is that it isn't at all obvious/easy how>>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable>>this thing to work. Following the examples in the documentation have>>not met with success. Running "rocks-dist cdrom" with the new kernels>>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates>>a bootable CD with the original 2.4.18BOOT kernel.> > > So you built a 2.4.23BOOT rpm? The problem people have is with the naming

Page 57: 2003 December

> convention of kernels. A kernel.org spec file isn't going to generate> proper kernel rpms IMHO. What you really want to do (and maybe you are> already doing this) is steal the bit of the Redhat spec building scripts> that generage the -smp .i686 and BOOT rpms.> > New hardware is tough for any distro.> > Tim> > Tim Carlson> Voice: (509) 376 3423> Email: Tim.Carlson at pnl.gov> EMSL UNIX System Support>

Where do you start if you want to update the PXE boot image to support a new kernel?

Terrence

From tmartin at physics.ucsd.edu Tue Dec 9 16:58:08 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Tue, 09 Dec 2003 16:58:08 -0800Subject: [Rocks-Discuss]Could not allocate requested partitionsMessage-ID: <[email protected]>

I am getting the following error when trying to install a Rocks 3.0.0 headnode. The headnode works find in rocks 2.3.2.

Could not allocate requested partitions: Partitioning failed: Could not allocate partitions as primary partitions

What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it cannot find that device (unable to open /dev/hda). However when I watch the boot messages hda definitely comes up. Also the headnode works fine with 2.3.2.

Any ideas?

Terrence

From tmartin at physics.ucsd.edu Tue Dec 9 17:33:24 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Tue, 09 Dec 2003 17:33:24 -0800Subject: [Rocks-Discuss]Could not allocate requested partitionsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Terrence Martin wrote:> I am getting the following error when trying to install a Rocks 3.0.0 > headnode. The headnode works find in rocks 2.3.2.>

Page 58: 2003 December

> Could not allocate requested partitions: Partitioning failed: Could not > allocate partitions as primary partitions> > What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it > cannot find that device (unable to open /dev/hda). However when I watch > the boot messages hda definitely comes up. Also the headnode works fine > with 2.3.2.> > Any ideas?> > Terrence> > >

Figured it out, aparently rocks 3.0.0 did not like my partitions from rocks 2.3.2. I booted knoppix, blew away the partition table and so far so good on the head node.

Terrence

From mjk at sdsc.edu Tue Dec 9 17:54:01 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 9 Dec 2003 17:54:01 -0800Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

If the underlying RedHat doesn't support your hardware you are pretty much dead in the water. We do at times include drivers that RH does not but this is an exception and only for hardware we physically have access to. The rocks-boot (rocks/src/rock/boot in CVS) package controls the boot kernel and module selection. You can look into this to see what it would take to add your own module. We do plan on refining and documenting this not for several months. We also have some very good idea on how we can track this faster than RH, but again nothing coming in the next few months.

To continue my earlier rant for today, until more hardware vendors start taking the linux market place seriously buying bleeding edge hardware and CPUs is asking for problems. It takes several months for any new hardware to become supported by RedHat and several years for any new CPU to be supported well. This isn't killing future Rocks installs, it's just correctly delaying them until the underlying OS supports the hardware.

-mjk

On Dec 9, 2003, at 1:59 PM, Joe Landman wrote:

> Folks:>> As indicated previously, I am wrestling with a Supermicro based> cluster. None of the RH distributions come with the correct E1000> driver, so a new kernel is needed (in the boot CD, and for

Page 59: 2003 December

> installation).>> The problem I am running into is that it isn't at all obvious/easy > how> to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable> this thing to work. Following the examples in the documentation have> not met with success. Running "rocks-dist cdrom" with the new kernels> (2.4.23 works nicely on the nodes) in the force/RPMS directory > generates> a bootable CD with the original 2.4.18BOOT kernel.>> What I (and I think others) need, is a simple/easy to follow method> that will generate a bootable CD with the correct linux kernel, and the> correct modules.>> Is this in process somewhere? What would be tremendously helpful is> if we can generate a binary module, and put that into the boot process> by placing it into the force/modules/binary directory (assuming one> exists) with the appropriate entry of a similar name in the> force/modules/meta directory as a simple XML document giving pci-ids,> description, name, etc.>> Anything close to this coming? Modules are killing future ROCKS> installs, the inability to easily inject a new module in there has> created a problem whereby ROCKS does not function (as the underlying RH> does not function).>>>> -- > Joseph Landman, Ph.D> Scalable Informatics LLC,> email: landman at scalableinformatics.com> web : http://scalableinformatics.com> phone: +1 734 612 4615

From gotero at linuxprophet.com Tue Dec 9 18:02:23 2003From: gotero at linuxprophet.com (gotero at linuxprophet.com)Date: Tue, 09 Dec 2003 18:02:23 -0800 (PST)Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)Message-ID: <20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>

Daniel-

I recently had the same problem when building a quadrics cluster on Rocks 2.3.2with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The problem isdefinitely in the naming of the rpms, in that anaconda running on the computenodes is not going to recognize kernel rpms that begin with 'qsnet' as potentialboot options. Unfortunately, being under a severe time contraint, I resorted tomanually installing the qsnet kernel on all nodes of the cluster, which isn'tthe Rocks way. The long term solution is to mangle the kernel makefiles so thatthe qsnet kernel rpms have conventional kernel rpm names, which is what Greg'spost referred to.

Glen

Page 60: 2003 December

On Mon, 8 Dec 2003 17:54:53 -0000, daniel.kidger at quadrics.com wrote:

> > Dear all,> Previously I have been installing a custom kernel on the compute nodes > with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fixgrub.conf).> > However I am now trying to do it the 'proper' way. So I do (on :> # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm \> /home/install/rocks-dist/7.3/en/os/i386/force/RPMS> # cd /home/install> # rocks-dist dist> # SSH_NO_PASSWD=1 shoot-node compute-0-0> > Hence:> # find /home/install/ |xargs -l grep -nH qsnet> shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I> duplicate my rpm in that directory rocks-dist notices this and warns me.)> > However the node always ends up with "2.4.20-20.7smp" again.> anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing> kernel-smp-2.4.20-20.7."> > So my question is:> It looks like my RPM has a name that Rocks doesn't understand properly. > What is wrong with my name ?> and what are the rules for getting the correct name ?> (.i686.rpm is of course correct, but I don't have -smp. in the name Isthis> the problem ?)> > cf. Greg Bruno's wisdom:> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html> > > Yours,> Daniel.> > --------------------------------------------------------------> Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com> One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505> ----------------------- www.quadrics.com --------------------> > >

Glen Otero, Ph.D.Linux Prophet

From gotero at linuxprophet.com Tue Dec 9 18:05:04 2003From: gotero at linuxprophet.com (gotero at linuxprophet.com)Date: Tue, 09 Dec 2003 18:05:04 -0800 (PST)Subject: [Rocks-Discuss]Could not allocate requested partitionsMessage-ID: <20031209180504.716.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>

Page 61: 2003 December

On Tue, 09 Dec 2003 17:33:24 -0800, Terrence Martin wrote:

> > Terrence Martin wrote:> > I am getting the following error when trying to install a Rocks 3.0.0 > > headnode. The headnode works find in rocks 2.3.2.> > > > Could not allocate requested partitions: Partitioning failed: Could not > > allocate partitions as primary partitions> > > > What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it > > cannot find that device (unable to open /dev/hda). However when I watch > > the boot messages hda definitely comes up. Also the headnode works fine > > with 2.3.2.> > > > Any ideas?> > > > Terrence> > > > > > > > Figured it out, aparently rocks 3.0.0 did not like my partitions from > rocks 2.3.2. I booted knoppix, blew away the partition table and so far > so good on the head node.

I had the same problem with moving from 2.3.2 to 3.1. I'll try your solution.

Glen

> > Terrence

Glen Otero, Ph.D.Linux Prophet

From jorge at phys.ufl.edu Tue Dec 9 18:55:02 2003From: jorge at phys.ufl.edu (Jorge L. Rodriguez)Date: Tue, 09 Dec 2003 21:55:02 -0500Subject: [Rocks-Discuss]Adding partitions that are not reformatted under hard boots or shoot-nodeMessage-ID: <[email protected]>

Hi,

How do I add an extra partition to my compute nodes and retain the data on all non / partitions when system hard boots or is shot?I tried the suggestion in the documentation under "Customizing your ROCKS Installation" where you replace the auto-partition.xml but hard boots or shoot-nodes on these reformat all partitions instead of just the /. I have also tried to modify the installclass.xml so that an extra partition is added into the python code see below. This does mostly what I want but now I can't shoot-node even though a hard boot reinstalls without reformatting all but /. Is this the right approach? I'd rather avoid having to replace installclass since I don't really want to partition all nodes this way but if I must I will.

Jorge

Page 62: 2003 December

# # set up the root partition # args = [ "/" , "--size" , "4096", "--fstype", "&fstype;", "--ondisk", devnames[0] ] KickstartBase.definePartition(self, id, args)

# ---- Jorge, I added this args args = [ "/state/partition1" , "--size" , "55000", "--fstype", "&fstype;", "--ondisk", devnames[0] ] KickstartBase.definePartition(self, id, args)# ----- args = [ "swap" , "--size" , "1000", "--ondisk", devnames[0] ] KickstartBase.definePartition(self, id, args)

# # greedy partitioning ## ----- Jorge, I change this from i = 1 i = 2# ----- for devname in devnames: partname = "/state/partition%d" % (i) args = [ partname, "--size", "1", "--fstype", "&fstype;", "--grow", "--ondisk", devname ] KickstartBase.definePartition(self, id, args)

i = i + 1

From bruno at rocksclusters.org Tue Dec 9 22:43:04 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 9 Dec 2003 22:43:04 -0800Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platformIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> Okay, came up my own quick hack:>> Edit atlas.spec.in, go to "other x86" section, remove> 2 lines right above "linux", seems to make rpm now.>> A more formal patch would be put in a section for> cpuid eq 4 with this correction I suppose.

if you provide the patch, we'll include it in our next release.

- gb

Page 63: 2003 December

From tlw at cs.unm.edu Tue Dec 9 23:23:43 2003From: tlw at cs.unm.edu (Tiffani Williams)Date: Wed, 10 Dec 2003 00:23:43 -0700Subject: [Rocks-Discuss]PBS errorsMessage-ID: <[email protected]>

Hello,

I am trying to submit a job through PBS, but I receive 2 errors. The first error is Job cannot be executed See job standard error file

The second error is that the standard error file cannot be written into my home directory.

I downloaded the sample script at http://rocks.npaci.edu/papers/rocks-documentation/launching-batch-jobs.htmland have tried a more simple script with PBS directives and echo commands.

I do not know what I am doing wrong? I have used PBS successfully on other clusters.

Does anyone have any suggestions?

Tiffani

From bruno at rocksclusters.org Tue Dec 9 23:35:59 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 9 Dec 2003 23:35:59 -0800Subject: [Rocks-Discuss]PBS errorsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> I am trying to submit a job through PBS, but I receive 2 errors. The > first error is> Job cannot be executed> See job standard error file>> The second error is that the standard error file cannot be written > into my home directory.> I downloaded the sample script at> > http://rocks.npaci.edu/papers/rocks-documentation/launching-batch- > jobs.html> and have tried a more simple script with PBS directives and echo > commands.>

Page 64: 2003 December

> I do not know what I am doing wrong? I have used PBS successfully on > other clusters.>> Does anyone have any suggestions?

can you login to the compute nodes successfully?

if not, try restarting autofs on all the compute nodes. on the frontend, execute:

# ssh-agent $SHELL# ssh-add

# cluster-fork "/etc/rc.d/init.d/autofs restart"

we've found the startup of autofs to be flaky at times.

- gb

From tlw at cs.unm.edu Wed Dec 10 00:03:13 2003From: tlw at cs.unm.edu (Tiffani Williams)Date: Wed, 10 Dec 2003 01:03:13 -0700Subject: [Rocks-Discuss]PBS errorsIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

>> I am trying to submit a job through PBS, but I receive 2 errors. >> The first error is>> Job cannot be executed>> See job standard error file>>>> The second error is that the standard error file cannot be written >> into my home directory.>> I downloaded the sample script at>> >> http://rocks.npaci.edu/papers/rocks-documentation/launching-batch- >> jobs.html>> and have tried a more simple script with PBS directives and echo >> commands.>>>> I do not know what I am doing wrong? I have used PBS successfully >> on other clusters.>>>> Does anyone have any suggestions?>>> can you login to the compute nodes successfully?>> if not, try restarting autofs on all the compute nodes. on the > frontend, execute:>> # ssh-agent $SHELL> # ssh-add>> # cluster-fork "/etc/rc.d/init.d/autofs restart"

Page 65: 2003 December

>> we've found the startup of autofs to be flaky at times.>> - gb

Do these commands have to be run by an administrator? If so, I do not have such privileges. I can ssh to the compute nodes, but I am denied entry. Am I supposed to be able to login to a compute node as a user.

Tiffani

From bruno at rocksclusters.org Wed Dec 10 06:37:05 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 10 Dec 2003 06:37:05 -0800Subject: [Rocks-Discuss]PBS errorsIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Dec 10, 2003, at 12:03 AM, Tiffani Williams wrote:

>>>> I am trying to submit a job through PBS, but I receive 2 errors. >>> The first error is>>> Job cannot be executed>>> See job standard error file>>>>>> The second error is that the standard error file cannot be written >>> into my home directory.>>> I downloaded the sample script at>>> >>> http://rocks.npaci.edu/papers/rocks-documentation/launching-batch- >>> jobs.html>>> and have tried a more simple script with PBS directives and echo >>> commands.>>>>>> I do not know what I am doing wrong? I have used PBS successfully >>> on other clusters.>>>>>> Does anyone have any suggestions?>>>>>> can you login to the compute nodes successfully?>>>> if not, try restarting autofs on all the compute nodes. on the >> frontend, execute:>>>> # ssh-agent $SHELL>> # ssh-add>>>> # cluster-fork "/etc/rc.d/init.d/autofs restart">>>> we've found the startup of autofs to be flaky at times.>>

Page 66: 2003 December

>> - gb>>> Do these commands have to be run by an administrator? If so, I do not > have such privileges. I can ssh to the compute nodes, but I am denied > entry. Am I supposed to be able to login to a compute node as a user.

yes, you need to be 'root'.

it appears your home directory is not being mounted when you login -- have your administrator run the commands above.

- gb

From mjk at sdsc.edu Wed Dec 10 07:20:47 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Wed, 10 Dec 2003 07:20:47 -0800Subject: [Rocks-Discuss]PBS errorsIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

This is most likely the dreaded NIS-crash. You'll need to restart the ypserver on the frontend and the ypbind daemon on all the nodes. We've seen this on our clusters maybe 4 times (on production systems) in the last several years. Others have seen this on a weekly basis. This is why NIS is dead in Rocks 3.1 - it served us reasonably well but never matured to a stable system.

-mjk

On Dec 10, 2003, at 6:37 AM, Greg Bruno wrote:

>> On Dec 10, 2003, at 12:03 AM, Tiffani Williams wrote:>>>>>>> I am trying to submit a job through PBS, but I receive 2 errors. >>>> The first error is>>>> Job cannot be executed>>>> See job standard error file>>>>>>>> The second error is that the standard error file cannot be written >>>> into my home directory.>>>> I downloaded the sample script at>>>> >>>> http://rocks.npaci.edu/papers/rocks-documentation/launching-batch- >>>> jobs.html>>>> and have tried a more simple script with PBS directives and echo >>>> commands.>>>>>>>> I do not know what I am doing wrong? I have used PBS successfully >>>> on other clusters.>>>>

Page 67: 2003 December

>>>> Does anyone have any suggestions?>>>>>>>>> can you login to the compute nodes successfully?>>>>>> if not, try restarting autofs on all the compute nodes. on the >>> frontend, execute:>>>>>> # ssh-agent $SHELL>>> # ssh-add>>>>>> # cluster-fork "/etc/rc.d/init.d/autofs restart">>>>>> we've found the startup of autofs to be flaky at times.>>>>>> - gb>>>>>> Do these commands have to be run by an administrator? If so, I do not >> have such privileges. I can ssh to the compute nodes, but I am >> denied entry. Am I supposed to be able to login to a compute node as >> a user.>> yes, you need to be 'root'.>> it appears your home directory is not being mounted when you login -- > have your administrator run the commands above.>> - gb

From vincent_b_fox at yahoo.com Wed Dec 10 07:59:14 2003From: vincent_b_fox at yahoo.com (Vincent Fox)Date: Wed, 10 Dec 2003 07:59:14 -0800 (PST)Subject: [Rocks-Discuss]one node short in "labels"Message-ID: <[email protected]>

So I go to the "labels" selection on the web page to print out the pretty labels. What a nice idea by the way! EXCEPT....it's one node short! I go up to 0-13 and this stops at 0-12. Any ideas where I should check to fix this?

---------------------------------Do you Yahoo!?New Yahoo! Photos - easier uploading and sharing-------------- next part --------------An HTML attachment was scrubbed...URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031210/c5bf5e79/attachment-0001.html

From cdwan at mail.ahc.umn.edu Wed Dec 10 12:04:53 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)Subject: [Rocks-Discuss]Non-homogenous legacy hardwareMessage-ID: <[email protected]>

Page 68: 2003 December

I am integrating legacy systems into a ROCKS cluster, and have hit asnag with the auto-partition configuration: The new (old) systems haveSCSI disks, while old (new) ones contain IDE. This is a non-issue solong as the initial install does its default partitioning. However, Ihave a "replace-auto-partition.xml" file which is unworkable for the SCSIbased systems since it makes specific reference to "hda" rather than"sda."

I would like to have a site-nodes/replace-auto-partition.xml file with aconditional such that "hda" or "sda" is used, based on the name of thenode (or some other criterion).

Is this possible?

Thanks, in advance. If this is out there on the mailing list archives, apointer would be greatly appreciated.

-Chris Dwan The University of Minnesota

From tmartin at physics.ucsd.edu Wed Dec 10 12:09:11 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Wed, 10 Dec 2003 12:09:11 -0800Subject: [Rocks-Discuss]Error during Make when building a new install floppyMessage-ID: <[email protected]>

I get the following error when I try to rebuild a boot floppy for rocks.

This is with the default CVS checkout with an update today according to the rocks userguide. I have not actually attempted to make any changes.

make[3]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'make[2]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'strip -o loader anaconda-7.3/loader/loaderstrip: anaconda-7.3/loader/loader: No such file or directorymake[1]: *** [loader] Error 1make[1]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader'make: *** [loader] Error 2

Of course I could avoid all of this together and just put my binary module into the appropriate location in the boot image.

Would it be correct to modify the following image file with my changes and then write it to a floppy via dd?

/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3/en/os/i386/images/bootnet.img

Basically I am injecting an updated e1000 driver with changes to pcitable to support the address of my gigabit cards.

Terrence

Page 69: 2003 December

From tim.carlson at pnl.gov Wed Dec 10 12:40:41 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)Subject: [Rocks-Discuss]Error during Make when building a new install floppyIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Wed, 10 Dec 2003, Terrence Martin wrote:

> I get the following error when I try to rebuild a boot floppy for rocks.>

You can't make a boot floppy with Rocks 3.0. That isn't supported. Or atleast it wasn't the last time I checked

> Of course I could avoid all of this together and just put my binary> module into the appropriate location in the boot image.>> Would it be correct to modify the following image file with my changes> and then write it to a floppy via dd?>> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3/en/os/i386/images/bootnet.img>> Basically I am injecting an updated e1000 driver with changes to> pcitable to support the address of my gigabit cards.

Modifiying the bootnet.img is about 1/3 of what you need to do if you godown that path. You also need to work on netstg1.img and you'll need toupdate the drive in the kernel rpm that gets installed on the box. None ofthis is trivial.

If it were me, I would go down the same path I took for updating theAIC79XX driver

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From tim.carlson at pnl.gov Wed Dec 10 12:52:38 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)Subject: [Rocks-Discuss]Non-homogenous legacy hardwareIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:

>> I am integrating legacy systems into a ROCKS cluster, and have hit a> snag with the auto-partition configuration: The new (old) systems have> SCSI disks, while old (new) ones contain IDE. This is a non-issue so

Page 70: 2003 December

> long as the initial install does its default partitioning. However, I> have a "replace-auto-partition.xml" file which is unworkable for the SCSI> based systems since it makes specific reference to "hda" rather than> "sda."

If you have just a single drive, then you should be able to skip the"--ondisk" bits of your "part" command

Otherwise, you would have first to do something ugly like the following:

http://penguin.epfl.ch/slides/kickstart/ks.cfg

You could probably (maybe) wrap most of that in an<eval sh="bash"></eval>

block in the <main> block.

Just guessing.. haven't tried this.

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From agrajag at dragaera.net Wed Dec 10 10:21:07 2003From: agrajag at dragaera.net (Jag)Date: Wed, 10 Dec 2003 13:21:07 -0500Subject: [Rocks-Discuss]ssh_known_hosts and gangliaMessage-ID: <1071080467.4693.6.camel@pel>

I noticed a previous post on this list(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934.html) indicating that Rocks distributes ssh keys for all the nodes overganglia. Can anyone enlighten me as to how this is done?

I looked through the ganglia docs and didn't see anything indicating howto do this, so I'm assuming Rocks made some changes. Unfortunately therocks iso images don't seem to contain srpms, so I'm now coming here. What did Rocks do to ganglia to make the distribution of ssh keys work?

Also, does anyone know where Rocks SRPMs can be found? I've done quitea bit of searching, but haven't found them anywhere.

From mjk at sdsc.edu Wed Dec 10 14:39:15 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Wed, 10 Dec 2003 14:39:15 -0800Subject: [Rocks-Discuss]ssh_known_hosts and gangliaIn-Reply-To: <1071080467.4693.6.camel@pel>References: <1071080467.4693.6.camel@pel>Message-ID: <[email protected]>

Most of the SRPMS are on our FTP site, but we've screwed this up

Page 71: 2003 December

before. The SRPMS are entirely Rocks specific so they are of little value outside of Rocks. You can also checkout our CVS tree (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We have a ganglia-python package we created to allow us to write our own metrics at a high level than the provide gmetric application. We've also moved from this method to a single cluster-wide ssh key for Rocks 3.1.

-mjk

On Dec 10, 2003, at 10:21 AM, Jag wrote:

> I noticed a previous post on this list> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/ > 001934.html) indicating that Rocks distributes ssh keys for all the > nodes over> ganglia. Can anyone enlighten me as to how this is done?>> I looked through the ganglia docs and didn't see anything indicating > how> to do this, so I'm assuming Rocks made some changes. Unfortunately the> rocks iso images don't seem to contain srpms, so I'm now coming here.> What did Rocks do to ganglia to make the distribution of ssh keys work?>> Also, does anyone know where Rocks SRPMs can be found? I've done quite> a bit of searching, but haven't found them anywhere.

From vrowley at ucsd.edu Wed Dec 10 14:43:49 2003From: vrowley at ucsd.edu (V. Rowley)Date: Wed, 10 Dec 2003 14:43:49 -0800Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distroMessage-ID: <[email protected]>

When I run this:

[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist --dist=cdrom cdrom

on a server installed with ROCKS 3.0.0, I eventually get this:

> Cleaning distribution> Resolving versions (RPMs)> Resolving versions (SRPMs)> Adding support for rebuild distribution from source> Creating files (symbolic links - fast)> Creating symlinks to kickstart files> Fixing Comps Database> Generating hdlist (rpm database)> Patching second stage loader (eKV, partioning, ...)> patching "rocks-ekv" into distribution ...> patching "rocks-piece-pipe" into distribution ...> patching "PyXML" into distribution ...> patching "expat" into distribution ...> patching "rocks-pylib" into distribution ...> patching "MySQL-python" into distribution ...> patching "rocks-kickstart" into distribution ...

Page 72: 2003 December

> patching "rocks-kickstart-profiles" into distribution ...> patching "rocks-kickstart-dtds" into distribution ...> building CRAM filesystem ...> Cleaning distribution> Resolving versions (RPMs)> Resolving versions (SRPMs)> Creating symlinks to kickstart files> Generating hdlist (rpm database)> Segregating RPMs (rocks, non-rocks)> sh: ./kickstart.cgi: No such file or directory> sh: ./kickstart.cgi: No such file or directory> Traceback (innermost last):> File "/opt/rocks/bin/rocks-dist", line 807, in ?> app.run()> File "/opt/rocks/bin/rocks-dist", line 623, in run> eval('self.command_%s()' % (command))> File "<string>", line 0, in ?> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom> builder.build()> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build> (rocks, nonrocks) = self.segregateRPMS()> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in segregateRPMS> for pkg in ks.getSection('packages'):> TypeError: loop over non-sequence

Any ideas?

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China at http://www.sagacitech.com/Chinaweb

From bruno at rocksclusters.org Wed Dec 10 15:12:49 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 10 Dec 2003 15:12:49 -0800Subject: [Rocks-Discuss]one node short in "labels"In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> So I go to the "labels" selection on the web page to print out the > pretty labels. What a nice idea by the way!> ?> EXCEPT....it's one node short! I go up to 0-13 and this stops at > 0-12.? Any ideas where I should check to fix this?

yeah, we found this corner case -- it'll be fixed in the next release.

thanks for bug report.

- gb

Page 73: 2003 December

From mjk at sdsc.edu Wed Dec 10 15:16:27 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Wed, 10 Dec 2003 15:16:27 -0800Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distroIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

It looks like someone moved the profiles directory to profiles.orig.

-mjk

[root at rocks14 install]# ls -ltotal 56drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdromdrwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:07 ftp.rocksclusters.orgdrwxr-sr-x 3 root wheel 4096 Dec 10 20:38 ftp.rocksclusters.org.orig-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgidrwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-distdrwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:02 srcdrwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.fooOn Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

> When I run this:>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; > rocks-dist --dist=cdrom cdrom>> on a server installed with ROCKS 3.0.0, I eventually get this:>>> Cleaning distribution>> Resolving versions (RPMs)>> Resolving versions (SRPMs)>> Adding support for rebuild distribution from source>> Creating files (symbolic links - fast)>> Creating symlinks to kickstart files>> Fixing Comps Database>> Generating hdlist (rpm database)>> Patching second stage loader (eKV, partioning, ...)>> patching "rocks-ekv" into distribution ...>> patching "rocks-piece-pipe" into distribution ...>> patching "PyXML" into distribution ...>> patching "expat" into distribution ...>> patching "rocks-pylib" into distribution ...>> patching "MySQL-python" into distribution ...>> patching "rocks-kickstart" into distribution ...>> patching "rocks-kickstart-profiles" into distribution ...>> patching "rocks-kickstart-dtds" into distribution ...>> building CRAM filesystem ...>> Cleaning distribution

Page 74: 2003 December

>> Resolving versions (RPMs)>> Resolving versions (SRPMs)>> Creating symlinks to kickstart files>> Generating hdlist (rpm database)>> Segregating RPMs (rocks, non-rocks)>> sh: ./kickstart.cgi: No such file or directory>> sh: ./kickstart.cgi: No such file or directory>> Traceback (innermost last):>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>> app.run()>> File "/opt/rocks/bin/rocks-dist", line 623, in run>> eval('self.command_%s()' % (command))>> File "<string>", line 0, in ?>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>> builder.build()>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>> (rocks, nonrocks) = self.segregateRPMS()>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >> segregateRPMS>> for pkg in ks.getSection('packages'):>> TypeError: loop over non-sequence>> Any ideas?>> -- > Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China at > http://www.sagacitech.com/Chinaweb

From vrowley at ucsd.edu Wed Dec 10 16:50:16 2003From: vrowley at ucsd.edu (V. Rowley)Date: Wed, 10 Dec 2003 16:50:16 -0800Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distroIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Yep, I did that, but only *AFTER* getting the error. [Thought it was generated by the rocks-dist sequence, but apparently not.] Go ahead. Move it back. Same difference.

Vicky

Mason J. Katz wrote:> It looks like someone moved the profiles directory to profiles.orig.> > -mjk> >

Page 75: 2003 December

> [root at rocks14 install]# ls -l> total 56> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 > ftp.rocksclusters.org> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 > ftp.rocksclusters.org.orig> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi> drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:> >> When I run this:>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >> rocks-dist --dist=cdrom cdrom>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>> Cleaning distribution>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Adding support for rebuild distribution from source>>> Creating files (symbolic links - fast)>>> Creating symlinks to kickstart files>>> Fixing Comps Database>>> Generating hdlist (rpm database)>>> Patching second stage loader (eKV, partioning, ...)>>> patching "rocks-ekv" into distribution ...>>> patching "rocks-piece-pipe" into distribution ...>>> patching "PyXML" into distribution ...>>> patching "expat" into distribution ...>>> patching "rocks-pylib" into distribution ...>>> patching "MySQL-python" into distribution ...>>> patching "rocks-kickstart" into distribution ...>>> patching "rocks-kickstart-profiles" into distribution ...>>> patching "rocks-kickstart-dtds" into distribution ...>>> building CRAM filesystem ...>>> Cleaning distribution>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Creating symlinks to kickstart files>>> Generating hdlist (rpm database)>>> Segregating RPMs (rocks, non-rocks)>>> sh: ./kickstart.cgi: No such file or directory>>> sh: ./kickstart.cgi: No such file or directory>>> Traceback (innermost last):>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>> app.run()>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>> eval('self.command_%s()' % (command))>>> File "<string>", line 0, in ?>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>> builder.build()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build

Page 76: 2003 December

>>> (rocks, nonrocks) = self.segregateRPMS()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>> segregateRPMS>>> for pkg in ks.getSection('packages'):>>> TypeError: loop over non-sequence>>>>>> Any ideas?>>>> -- >> Vicky Rowley email: vrowley at ucsd.edu>> Biomedical Informatics Research Network work: (858) 536-5980>> University of California, San Diego fax: (858) 822-0828>> 9500 Gilman Drive>> La Jolla, CA 92093-0715>>>>>> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb> > >

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China at http://www.sagacitech.com/Chinaweb

From tim.carlson at pnl.gov Wed Dec 10 17:23:25 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distroIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Wed, 10 Dec 2003, V. Rowley wrote:

Did you remove python by chance? kickstart.cgi calls python directly in/usr/bin/python while rocks-dist does an "env python"

Tim

> Yep, I did that, but only *AFTER* getting the error. [Thought it was> generated by the rocks-dist sequence, but apparently not.] Go ahead.> Move it back. Same difference.>> Vicky>> Mason J. Katz wrote:> > It looks like someone moved the profiles directory to profiles.orig.> >> > -mjk

Page 77: 2003 December

> >> >> > [root at rocks14 install]# ls -l> > total 56> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom> > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07> > ftp.rocksclusters.org> > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38> > ftp.rocksclusters.org.orig> > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi> > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist> > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src> > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo> > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:> >> >> When I run this:> >>> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;> >> rocks-dist --dist=cdrom cdrom> >>> >> on a server installed with ROCKS 3.0.0, I eventually get this:> >>> >>> Cleaning distribution> >>> Resolving versions (RPMs)> >>> Resolving versions (SRPMs)> >>> Adding support for rebuild distribution from source> >>> Creating files (symbolic links - fast)> >>> Creating symlinks to kickstart files> >>> Fixing Comps Database> >>> Generating hdlist (rpm database)> >>> Patching second stage loader (eKV, partioning, ...)> >>> patching "rocks-ekv" into distribution ...> >>> patching "rocks-piece-pipe" into distribution ...> >>> patching "PyXML" into distribution ...> >>> patching "expat" into distribution ...> >>> patching "rocks-pylib" into distribution ...> >>> patching "MySQL-python" into distribution ...> >>> patching "rocks-kickstart" into distribution ...> >>> patching "rocks-kickstart-profiles" into distribution ...> >>> patching "rocks-kickstart-dtds" into distribution ...> >>> building CRAM filesystem ...> >>> Cleaning distribution> >>> Resolving versions (RPMs)> >>> Resolving versions (SRPMs)> >>> Creating symlinks to kickstart files> >>> Generating hdlist (rpm database)> >>> Segregating RPMs (rocks, non-rocks)> >>> sh: ./kickstart.cgi: No such file or directory> >>> sh: ./kickstart.cgi: No such file or directory> >>> Traceback (innermost last):> >>> File "/opt/rocks/bin/rocks-dist", line 807, in ?> >>> app.run()> >>> File "/opt/rocks/bin/rocks-dist", line 623, in run> >>> eval('self.command_%s()' % (command))> >>> File "<string>", line 0, in ?> >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom

Page 78: 2003 December

> >>> builder.build()> >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build> >>> (rocks, nonrocks) = self.segregateRPMS()> >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in> >>> segregateRPMS> >>> for pkg in ks.getSection('packages'):> >>> TypeError: loop over non-sequence> >>> >>> >> Any ideas?> >>> >> --> >> Vicky Rowley email: vrowley at ucsd.edu> >> Biomedical Informatics Research Network work: (858) 536-5980> >> University of California, San Diego fax: (858) 822-0828> >> 9500 Gilman Drive> >> La Jolla, CA 92093-0715> >>> >>> >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb> >> >> >>> --> Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb>>

From naihh at imcb.a-star.edu.sg Wed Dec 10 17:45:18 2003From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)Date: Thu, 11 Dec 2003 09:45:18 +0800Subject: [Rocks-Discuss]RE: Do you have a list of the various models of Gigabit Ethernet Interfaces compatible to Rocks 3?Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCD66@EXIMCB2.imcb.a-star.edu.sg>

Hi All,

Do you have a list of the various gigabit Ethernet interfaces that arecompatible to Rocks 3?

I am changing my nodes connectivity from 10/100 to 1000.

Have anyone done that and how are the differences in performance orturnaround time?

Have anyone successfully build a set of grid compute nodes using Rocks3?

Page 79: 2003 December

Thanks and Regards

Nai Hong Hwa FrancisInstitute of Molecular and Cell Biology (A*STAR)30 Medical DriveSingapore 117609.DID: (65) 6874-6196

-----Original Message-----From: npaci-rocks-discussion-request at sdsc.edu[mailto:npaci-rocks-discussion-request at sdsc.edu] Sent: Thursday, December 11, 2003 9:25 AMTo: npaci-rocks-discussion at sdsc.eduSubject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs

Send npaci-rocks-discussion mailing list submissions tonpaci-rocks-discussion at sdsc.edu

To subscribe or unsubscribe via the World Wide Web, visit

http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussionor, via email, send a message with subject or body 'help' to

npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list atnpaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specificthan "Re: Contents of npaci-rocks-discussion digest..."

Today's Topics:

1. Non-homogenous legacy hardware (Chris Dwan (CCGB)) 2. Error during Make when building a new install floppy (TerrenceMartin) 3. Re: Error during Make when building a new install floppy (TimCarlson) 4. Re: Non-homogenous legacy hardware (Tim Carlson) 5. ssh_known_hosts and ganglia (Jag) 6. Re: ssh_known_hosts and ganglia (Mason J. Katz) 7. "TypeError: loop over non-sequence" when trying to build CDdistro (V. Rowley) 8. Re: one node short in "labels" (Greg Bruno) 9. Re: "TypeError: loop over non-sequence" when trying to build CDdistro (Mason J. Katz) 10. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 11. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Tim Carlson)

--__--__--

Message: 1Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>To: npaci-rocks-discussion at sdsc.edu

Page 80: 2003 December

Subject: [Rocks-Discuss]Non-homogenous legacy hardware

I am integrating legacy systems into a ROCKS cluster, and have hit asnag with the auto-partition configuration: The new (old) systems haveSCSI disks, while old (new) ones contain IDE. This is a non-issue solong as the initial install does its default partitioning. However, Ihave a "replace-auto-partition.xml" file which is unworkable for theSCSIbased systems since it makes specific reference to "hda" rather than"sda."

I would like to have a site-nodes/replace-auto-partition.xml file with aconditional such that "hda" or "sda" is used, based on the name of thenode (or some other criterion).

Is this possible?

Thanks, in advance. If this is out there on the mailing list archives,apointer would be greatly appreciated.

-Chris Dwan The University of Minnesota

--__--__--

Message: 2Date: Wed, 10 Dec 2003 12:09:11 -0800From: Terrence Martin <tmartin at physics.ucsd.edu>To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>Subject: [Rocks-Discuss]Error during Make when building a new installfloppy

I get the following error when I try to rebuild a boot floppy for rocks.

This is with the default CVS checkout with an update today according to the rocks userguide. I have not actually attempted to make any changes.

make[3]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'make[2]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'strip -o loader anaconda-7.3/loader/loaderstrip: anaconda-7.3/loader/loader: No such file or directorymake[1]: *** [loader] Error 1make[1]: Leaving directory`/home/install/rocks/src/rocks/boot/7.3/loader'make: *** [loader] Error 2

Of course I could avoid all of this together and just put my binary module into the appropriate location in the boot image.

Would it be correct to modify the following image file with my changes and then write it to a floppy via dd?

/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3/en/os/i386/images/bootnet.img

Page 81: 2003 December

Basically I am injecting an updated e1000 driver with changes to pcitable to support the address of my gigabit cards.

Terrence

--__--__--

Message: 3Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]Error during Make when building a newinstall floppyTo: Terrence Martin <tmartin at physics.ucsd.edu>Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>Reply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, Terrence Martin wrote:

> I get the following error when I try to rebuild a boot floppy forrocks.>

You can't make a boot floppy with Rocks 3.0. That isn't supported. Or atleast it wasn't the last time I checked

> Of course I could avoid all of this together and just put my binary> module into the appropriate location in the boot image.>> Would it be correct to modify the following image file with my changes> and then write it to a floppy via dd?>>/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3/en/os/i386/images/bootnet.img>> Basically I am injecting an updated e1000 driver with changes to> pcitable to support the address of my gigabit cards.

Modifiying the bootnet.img is about 1/3 of what you need to do if you godown that path. You also need to work on netstg1.img and you'll need toupdate the drive in the kernel rpm that gets installed on the box. Noneofthis is trivial.

If it were me, I would go down the same path I took for updating theAIC79XX driver

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

Page 82: 2003 December

--__--__--

Message: 4Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardwareTo: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>Cc: npaci-rocks-discussion at sdsc.eduReply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:

>> I am integrating legacy systems into a ROCKS cluster, and have hit a> snag with the auto-partition configuration: The new (old) systemshave> SCSI disks, while old (new) ones contain IDE. This is a non-issue so> long as the initial install does its default partitioning. However, I> have a "replace-auto-partition.xml" file which is unworkable for theSCSI> based systems since it makes specific reference to "hda" rather than> "sda."

If you have just a single drive, then you should be able to skip the"--ondisk" bits of your "part" command

Otherwise, you would have first to do something ugly like the following:

http://penguin.epfl.ch/slides/kickstart/ks.cfg

You could probably (maybe) wrap most of that in an<eval sh="bash"></eval>

block in the <main> block.

Just guessing.. haven't tried this.

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

--__--__--

Message: 5From: Jag <agrajag at dragaera.net>To: npaci-rocks-discussion at sdsc.eduDate: Wed, 10 Dec 2003 13:21:07 -0500Subject: [Rocks-Discuss]ssh_known_hosts and ganglia

I noticed a previous post on this list(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934.html) indicating that Rocks distributes ssh keys for all the nodes overganglia. Can anyone enlighten me as to how this is done?

Page 83: 2003 December

I looked through the ganglia docs and didn't see anything indicating howto do this, so I'm assuming Rocks made some changes. Unfortunately therocks iso images don't seem to contain srpms, so I'm now coming here. What did Rocks do to ganglia to make the distribution of ssh keys work?

Also, does anyone know where Rocks SRPMs can be found? I've done quitea bit of searching, but haven't found them anywhere.

--__--__--

Message: 6Cc: npaci-rocks-discussion at sdsc.eduFrom: "Mason J. Katz" <mjk at sdsc.edu>Subject: Re: [Rocks-Discuss]ssh_known_hosts and gangliaDate: Wed, 10 Dec 2003 14:39:15 -0800To: Jag <agrajag at dragaera.net>

Most of the SRPMS are on our FTP site, but we've screwed this up before. The SRPMS are entirely Rocks specific so they are of little value outside of Rocks. You can also checkout our CVS tree (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We have a ganglia-python package we created to allow us to write our own metrics at a high level than the provide gmetric application. We've also moved from this method to a single cluster-wide ssh key for Rocks 3.1.

-mjk

On Dec 10, 2003, at 10:21 AM, Jag wrote:

> I noticed a previous post on this list> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/ > 001934.html) indicating that Rocks distributes ssh keys for all the > nodes over> ganglia. Can anyone enlighten me as to how this is done?>> I looked through the ganglia docs and didn't see anything indicating > how> to do this, so I'm assuming Rocks made some changes. Unfortunatelythe> rocks iso images don't seem to contain srpms, so I'm now coming here.> What did Rocks do to ganglia to make the distribution of ssh keyswork?>> Also, does anyone know where Rocks SRPMs can be found? I've donequite> a bit of searching, but haven't found them anywhere.

--__--__--

Message: 7Date: Wed, 10 Dec 2003 14:43:49 -0800From: "V. Rowley" <vrowley at ucsd.edu>To: npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]"TypeError: loop over non-sequence" when tryingto build CD distro

Page 84: 2003 December

When I run this:

[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist

--dist=cdrom cdrom

on a server installed with ROCKS 3.0.0, I eventually get this:

> Cleaning distribution> Resolving versions (RPMs)> Resolving versions (SRPMs)> Adding support for rebuild distribution from source> Creating files (symbolic links - fast)> Creating symlinks to kickstart files> Fixing Comps Database> Generating hdlist (rpm database)> Patching second stage loader (eKV, partioning, ...)> patching "rocks-ekv" into distribution ...> patching "rocks-piece-pipe" into distribution ...> patching "PyXML" into distribution ...> patching "expat" into distribution ...> patching "rocks-pylib" into distribution ...> patching "MySQL-python" into distribution ...> patching "rocks-kickstart" into distribution ...> patching "rocks-kickstart-profiles" into distribution ...> patching "rocks-kickstart-dtds" into distribution ...> building CRAM filesystem ...> Cleaning distribution> Resolving versions (RPMs)> Resolving versions (SRPMs)> Creating symlinks to kickstart files> Generating hdlist (rpm database)> Segregating RPMs (rocks, non-rocks)> sh: ./kickstart.cgi: No such file or directory> sh: ./kickstart.cgi: No such file or directory> Traceback (innermost last):> File "/opt/rocks/bin/rocks-dist", line 807, in ?> app.run()> File "/opt/rocks/bin/rocks-dist", line 623, in run> eval('self.command_%s()' % (command))> File "<string>", line 0, in ?> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom> builder.build()> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build> (rocks, nonrocks) = self.segregateRPMS()> File "/opt/rocks/lib/python/rocks/build.py", line 1107, insegregateRPMS> for pkg in ks.getSection('packages'):> TypeError: loop over non-sequence

Any ideas?

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

Page 85: 2003 December

See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

--__--__--

Message: 8Cc: rocks <npaci-rocks-discussion at sdsc.edu>From: Greg Bruno <bruno at rocksclusters.org>Subject: Re: [Rocks-Discuss]one node short in "labels"Date: Wed, 10 Dec 2003 15:12:49 -0800To: Vincent Fox <vincent_b_fox at yahoo.com>

> So I go to the "labels" selection on the web page to print out the=20> pretty labels. What a nice idea by the way!> =A0> EXCEPT....it's one node short! I go up to 0-13 and this stops at=20> 0-12.=A0 Any ideas where I should check to fix this?

yeah, we found this corner case -- it'll be fixed in the next release.

thanks for bug report.

- gb

--__--__--

Message: 9Cc: npaci-rocks-discussion at sdsc.eduFrom: "Mason J. Katz" <mjk at sdsc.edu>Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distroDate: Wed, 10 Dec 2003 15:16:27 -0800To: "V. Rowley" <vrowley at ucsd.edu>

It looks like someone moved the profiles directory to profiles.orig.

-mjk

[root at rocks14 install]# ls -ltotal 56drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdromdrwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:07 ftp.rocksclusters.orgdrwxr-sr-x 3 root wheel 4096 Dec 10 20:38 ftp.rocksclusters.org.orig-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgidrwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-distdrwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:02 srcdrwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.fooOn Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

> When I run this:

Page 86: 2003 December

>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; > rocks-dist --dist=cdrom cdrom>> on a server installed with ROCKS 3.0.0, I eventually get this:>>> Cleaning distribution>> Resolving versions (RPMs)>> Resolving versions (SRPMs)>> Adding support for rebuild distribution from source>> Creating files (symbolic links - fast)>> Creating symlinks to kickstart files>> Fixing Comps Database>> Generating hdlist (rpm database)>> Patching second stage loader (eKV, partioning, ...)>> patching "rocks-ekv" into distribution ...>> patching "rocks-piece-pipe" into distribution ...>> patching "PyXML" into distribution ...>> patching "expat" into distribution ...>> patching "rocks-pylib" into distribution ...>> patching "MySQL-python" into distribution ...>> patching "rocks-kickstart" into distribution ...>> patching "rocks-kickstart-profiles" into distribution ...>> patching "rocks-kickstart-dtds" into distribution ...>> building CRAM filesystem ...>> Cleaning distribution>> Resolving versions (RPMs)>> Resolving versions (SRPMs)>> Creating symlinks to kickstart files>> Generating hdlist (rpm database)>> Segregating RPMs (rocks, non-rocks)>> sh: ./kickstart.cgi: No such file or directory>> sh: ./kickstart.cgi: No such file or directory>> Traceback (innermost last):>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>> app.run()>> File "/opt/rocks/bin/rocks-dist", line 623, in run>> eval('self.command_%s()' % (command))>> File "<string>", line 0, in ?>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>> builder.build()>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>> (rocks, nonrocks) = self.segregateRPMS()>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >> segregateRPMS>> for pkg in ks.getSection('packages'):>> TypeError: loop over non-sequence>> Any ideas?>> -- > Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China at

Page 87: 2003 December

> http://www.sagacitech.com/Chinaweb

--__--__--

Message: 10Date: Wed, 10 Dec 2003 16:50:16 -0800From: "V. Rowley" <vrowley at ucsd.edu>To: "Mason J. Katz" <mjk at sdsc.edu>CC: npaci-rocks-discussion at sdsc.eduSubject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distro

Yep, I did that, but only *AFTER* getting the error. [Thought it was generated by the rocks-dist sequence, but apparently not.] Go ahead. Move it back. Same difference.

Vicky

Mason J. Katz wrote:> It looks like someone moved the profiles directory to profiles.orig.> > -mjk> > > [root at rocks14 install]# ls -l> total 56> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 > ftp.rocksclusters.org> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 > ftp.rocksclusters.org.orig> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi> drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38rocks-dist.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:> >> When I run this:>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >> rocks-dist --dist=cdrom cdrom>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>> Cleaning distribution>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Adding support for rebuild distribution from source>>> Creating files (symbolic links - fast)>>> Creating symlinks to kickstart files>>> Fixing Comps Database>>> Generating hdlist (rpm database)>>> Patching second stage loader (eKV, partioning, ...)

Page 88: 2003 December

>>> patching "rocks-ekv" into distribution ...>>> patching "rocks-piece-pipe" into distribution ...>>> patching "PyXML" into distribution ...>>> patching "expat" into distribution ...>>> patching "rocks-pylib" into distribution ...>>> patching "MySQL-python" into distribution ...>>> patching "rocks-kickstart" into distribution ...>>> patching "rocks-kickstart-profiles" into distribution ...>>> patching "rocks-kickstart-dtds" into distribution ...>>> building CRAM filesystem ...>>> Cleaning distribution>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Creating symlinks to kickstart files>>> Generating hdlist (rpm database)>>> Segregating RPMs (rocks, non-rocks)>>> sh: ./kickstart.cgi: No such file or directory>>> sh: ./kickstart.cgi: No such file or directory>>> Traceback (innermost last):>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>> app.run()>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>> eval('self.command_%s()' % (command))>>> File "<string>", line 0, in ?>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>> builder.build()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>> (rocks, nonrocks) = self.segregateRPMS()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>> segregateRPMS>>> for pkg in ks.getSection('packages'):>>> TypeError: loop over non-sequence>>>>>> Any ideas?>>>> -- >> Vicky Rowley email: vrowley at ucsd.edu>> Biomedical Informatics Research Network work: (858) 536-5980>> University of California, San Diego fax: (858) 822-0828>> 9500 Gilman Drive>> La Jolla, CA 92093-0715>>>>>> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb> > >

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China at

Page 89: 2003 December

http://www.sagacitech.com/Chinaweb

--__--__--

Message: 11Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distroTo: "V. Rowley" <vrowley at ucsd.edu>Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.eduReply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, V. Rowley wrote:

Did you remove python by chance? kickstart.cgi calls python directly in/usr/bin/python while rocks-dist does an "env python"

Tim

> Yep, I did that, but only *AFTER* getting the error. [Thought it was> generated by the rocks-dist sequence, but apparently not.] Go ahead.> Move it back. Same difference.>> Vicky>> Mason J. Katz wrote:> > It looks like someone moved the profiles directory to profiles.orig.> >> > -mjk> >> >> > [root at rocks14 install]# ls -l> > total 56> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom> > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07> > ftp.rocksclusters.org> > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38> > ftp.rocksclusters.org.orig> > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40kickstart.cgi> > drwxr-xr-x 3 root root 4096 Dec 10 20:38profiles.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist> > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38rocks-dist.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src> > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo> > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:> >> >> When I run this:> >>> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;> >> rocks-dist --dist=cdrom cdrom> >>> >> on a server installed with ROCKS 3.0.0, I eventually get this:

Page 90: 2003 December

> >>> >>> Cleaning distribution> >>> Resolving versions (RPMs)> >>> Resolving versions (SRPMs)> >>> Adding support for rebuild distribution from source> >>> Creating files (symbolic links - fast)> >>> Creating symlinks to kickstart files> >>> Fixing Comps Database> >>> Generating hdlist (rpm database)> >>> Patching second stage loader (eKV, partioning, ...)> >>> patching "rocks-ekv" into distribution ...> >>> patching "rocks-piece-pipe" into distribution ...> >>> patching "PyXML" into distribution ...> >>> patching "expat" into distribution ...> >>> patching "rocks-pylib" into distribution ...> >>> patching "MySQL-python" into distribution ...> >>> patching "rocks-kickstart" into distribution ...> >>> patching "rocks-kickstart-profiles" into distribution ...> >>> patching "rocks-kickstart-dtds" into distribution ...> >>> building CRAM filesystem ...> >>> Cleaning distribution> >>> Resolving versions (RPMs)> >>> Resolving versions (SRPMs)> >>> Creating symlinks to kickstart files> >>> Generating hdlist (rpm database)> >>> Segregating RPMs (rocks, non-rocks)> >>> sh: ./kickstart.cgi: No such file or directory> >>> sh: ./kickstart.cgi: No such file or directory> >>> Traceback (innermost last):> >>> File "/opt/rocks/bin/rocks-dist", line 807, in ?> >>> app.run()> >>> File "/opt/rocks/bin/rocks-dist", line 623, in run> >>> eval('self.command_%s()' % (command))> >>> File "<string>", line 0, in ?> >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom> >>> builder.build()> >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build> >>> (rocks, nonrocks) = self.segregateRPMS()> >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in> >>> segregateRPMS> >>> for pkg in ks.getSection('packages'):> >>> TypeError: loop over non-sequence> >>> >>> >> Any ideas?> >>> >> --> >> Vicky Rowley email: vrowley at ucsd.edu> >> Biomedical Informatics Research Network work: (858) 536-5980> >> University of California, San Diego fax: (858) 822-0828> >> 9500 Gilman Drive> >> La Jolla, CA 92093-0715> >>> >>> >> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb> >> >> >

Page 91: 2003 December

>> --> Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb>>

--__--__--

_______________________________________________npaci-rocks-discussion mailing listnpaci-rocks-discussion at sdsc.eduhttp://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

End of npaci-rocks-discussion Digest

DISCLAIMER:This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you.

From tmartin at physics.ucsd.edu Wed Dec 10 18:03:41 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Wed, 10 Dec 2003 18:03:41 -0800Subject: [Rocks-Discuss]Rocks 3.0.0Message-ID: <[email protected]>

I am having a problem on install of rocks 3.0.0 on my new cluster.

The python error occurs right after anaconda starts and just before the install asks for the roll CDROM.

The error refers to an inability to find or load rocks.file. The error is associated I think with the window that pops up and asks you in put the roll CDROM in.

The process I followed to get to this point is

Put the Rocks 3.0.0 CDROM into the CDROM driveBoot the systemAt the prompt type frontendWait till anaconda startsError referring to unable to load rocks.file.

I have successfully installed rocks on a smaller cluster but that has

Page 92: 2003 December

different hardware. I used the same CDROM for both installs.

Any thoughts?

Terrence

From vrowley at ucsd.edu Wed Dec 10 19:52:49 2003From: vrowley at ucsd.edu (V. Rowley)Date: Wed, 10 Dec 2003 19:52:49 -0800Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distroIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Looks like python is okay:

> [root at rocks14 birn-oracle1]# which python> /usr/bin/python> [root at rocks14 birn-oracle1]# python --help> Unknown option: --> usage: python [option] ... [-c cmd | file | -] [arg] ...> Options and arguments (and corresponding environment variables):> -d : debug output from parser (also PYTHONDEBUG=x)> -i : inspect interactively after running script, (also PYTHONINSPECT=x)> and force prompts, even if stdin does not appear to be a terminal> -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)> -OO : remove doc-strings in addition to the -O optimizations> -S : don't imply 'import site' on initialization> -t : issue warnings about inconsistent tab usage (-tt: issue errors)> -u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)> -v : verbose (trace import statements) (also PYTHONVERBOSE=x)> -x : skip first line of source, allowing use of non-Unix forms of #!cmd> -X : disable class based built-in exceptions> -c cmd : program passed in as string (terminates option list)> file : program read from script file> - : program read from stdin (default; interactive mode if a tty)> arg ...: arguments passed to program in sys.argv[1:]> Other environment variables:> PYTHONSTARTUP: file executed on interactive startup (no default)> PYTHONPATH : ':'-separated list of directories prefixed to the> default module search path. The result is sys.path.> PYTHONHOME : alternate <prefix> directory (or <prefix>:<exec_prefix>).> The default module search path uses <prefix>/python1.5.> [root at rocks14 birn-oracle1]#

Tim Carlson wrote:> On Wed, 10 Dec 2003, V. Rowley wrote:> > Did you remove python by chance? kickstart.cgi calls python directly in> /usr/bin/python while rocks-dist does an "env python"> > Tim>

Page 93: 2003 December

> >>Yep, I did that, but only *AFTER* getting the error. [Thought it was>>generated by the rocks-dist sequence, but apparently not.] Go ahead.>>Move it back. Same difference.>>>>Vicky>>>>Mason J. Katz wrote:>>>>>It looks like someone moved the profiles directory to profiles.orig.>>>>>> -mjk>>>>>>>>>[root at rocks14 install]# ls -l>>>total 56>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom>>>drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:07>>>ftp.rocksclusters.org>>>drwxr-sr-x 3 root wheel 4096 Dec 10 20:38>>>ftp.rocksclusters.org.orig>>>-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi>>>drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist>>>drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src>>>drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo>>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>>>>>>>>>>When I run this:>>>>>>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;>>>>rocks-dist --dist=cdrom cdrom>>>>>>>>on a server installed with ROCKS 3.0.0, I eventually get this:>>>>>>>>>>>>>Cleaning distribution>>>>>Resolving versions (RPMs)>>>>>Resolving versions (SRPMs)>>>>>Adding support for rebuild distribution from source>>>>>Creating files (symbolic links - fast)>>>>>Creating symlinks to kickstart files>>>>>Fixing Comps Database>>>>>Generating hdlist (rpm database)>>>>>Patching second stage loader (eKV, partioning, ...)>>>>> patching "rocks-ekv" into distribution ...>>>>> patching "rocks-piece-pipe" into distribution ...>>>>> patching "PyXML" into distribution ...>>>>> patching "expat" into distribution ...>>>>> patching "rocks-pylib" into distribution ...>>>>> patching "MySQL-python" into distribution ...>>>>> patching "rocks-kickstart" into distribution ...>>>>> patching "rocks-kickstart-profiles" into distribution ...>>>>> patching "rocks-kickstart-dtds" into distribution ...>>>>> building CRAM filesystem ...>>>>>Cleaning distribution

Page 94: 2003 December

>>>>>Resolving versions (RPMs)>>>>>Resolving versions (SRPMs)>>>>>Creating symlinks to kickstart files>>>>>Generating hdlist (rpm database)>>>>>Segregating RPMs (rocks, non-rocks)>>>>>sh: ./kickstart.cgi: No such file or directory>>>>>sh: ./kickstart.cgi: No such file or directory>>>>>Traceback (innermost last):>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>>>> app.run()>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>>>> eval('self.command_%s()' % (command))>>>>> File "<string>", line 0, in ?>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>>>> builder.build()>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>>>> (rocks, nonrocks) = self.segregateRPMS()>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in>>>>>segregateRPMS>>>>> for pkg in ks.getSection('packages'):>>>>>TypeError: loop over non-sequence>>>>>>>>>>>>Any ideas?>>>>>>>>-->>>>Vicky Rowley email: vrowley at ucsd.edu>>>>Biomedical Informatics Research Network work: (858) 536-5980>>>>University of California, San Diego fax: (858) 822-0828>>>>9500 Gilman Drive>>>>La Jolla, CA 92093-0715>>>>>>>>>>>>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb>>>>>>>>>>>-->>Vicky Rowley email: vrowley at ucsd.edu>>Biomedical Informatics Research Network work: (858) 536-5980>>University of California, San Diego fax: (858) 822-0828>>9500 Gilman Drive>>La Jolla, CA 92093-0715>>>>>>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb>>>>> > > >

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

Page 95: 2003 December

See pictures from our trip to China at http://www.sagacitech.com/Chinaweb

From wyzhong78 at msn.com Wed Dec 10 20:38:53 2003From: wyzhong78 at msn.com (zhong wenyu)Date: Thu, 11 Dec 2003 12:38:53 +0800Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot upMessage-ID: <[email protected]>

>From: Greg Bruno <bruno at rocksclusters.org>>To: "zhong wenyu" <wyzhong78 at msn.com>>CC: npaci-rocks-discussion at sdsc.edu>Subject: Re: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up>Date: Mon, 8 Dec 2003 15:31:08 -0800>>>I have installed Rocks 3.0.0 with default options successful,there >>was not any trouble.But I boot it up,it stopped at beginning,just >>show "GRUB" on the screen and waiting...>>when you built the frontend, did you start with the rocks base CD >then add the HPC roll?>> - gb>I have raveled out this trouble.But I don't know why.I have one SCSI harddisk and one IDE disk On the frontend,I choose SCSI to be the first HDD and installed "/" on it.then it can not boot up.Even disabled the IDE HDD and install it again,It can not boot up also.at last I choose SCSI to be the first HDD and install,then choose IDE HDD to be the first and boot up, it's ok!GRUB must be installed on IDE HDD?thanks!

_________________________________________________________________??????????????? MSN Hotmail? http://www.hotmail.com

From wyzhong78 at msn.com Wed Dec 10 20:44:09 2003From: wyzhong78 at msn.com (zhong wenyu)Date: Thu, 11 Dec 2003 12:44:09 +0800Subject: [Rocks-Discuss]I can't use xpbs in rocksMessage-ID: <[email protected]>

Hi,everyone!I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of them.typed:xpbs[enter]showed:xpbs: initialization failed! output: invalid command name "Pref_Init"thanks!

_________________________________________________________________?????????????? MSN Messenger: http://messenger.msn.com/cn

Page 96: 2003 December

From phil at sdsc.edu Wed Dec 10 21:26:50 2003From: phil at sdsc.edu (Philip Papadopoulos)Date: Wed, 10 Dec 2003 21:26:50 -0800Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot upIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

There is a conflict in the way the BIOS numbers drives and the way theinstallkernel numbers the drive (and this is not standard). You should check inyour BIOSif you can select which is the boot device. If it just says "Hard Disk"(no choice betweenIDE and SCSI), then you are stuck with needing to have Grub on thedevice thatBIOS thinks is the boot device. If you can choose, then SCSI canprobably be madeto work.

These sorts of issues (this is a general redhat/linux problem) can bequite troublesome(and annoying). We had some older HW that had two different types ofSCSI controllerswith drives on each controller. The boot kernel labeled the /sdadifferently than the BIOS.Install went fine, by the dreaded "OS Not Found" BIOS message whenrebooting. The cause was thatthe Grub loader was being put on Linux's notion of /sda, but when BIOSloaded, it foundnothing (because grub was installed on BIOS's idea of /sdb). For thisparticular machine, we were notable to change BIOSes notion -- we had to force Linux to boot thebootloader on linuxes idea of/sdb.

-P

zhong wenyu wrote:

>>>>> From: Greg Bruno <bruno at rocksclusters.org>>> To: "zhong wenyu" <wyzhong78 at msn.com>>> CC: npaci-rocks-discussion at sdsc.edu>> Subject: Re: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up>> Date: Mon, 8 Dec 2003 15:31:08 -0800>>>>> I have installed Rocks 3.0.0 with default options successful,there>>> was not any trouble.But I boot it up,it stopped at beginning,just>>> show "GRUB" on the screen and waiting...>>>>>> when you built the frontend, did you start with the rocks base CD>> then add the HPC roll?

Page 97: 2003 December

>>>> - gb>>> I have raveled out this trouble.But I don't know why.> I have one SCSI harddisk and one IDE disk On the frontend,I choose> SCSI to be the first HDD and installed "/" on it.then it can not boot> up.Even disabled the IDE HDD and install it again,It can not boot up> also.at last I choose SCSI to be the first HDD and install,then choose> IDE HDD to be the first and boot up, it's ok!> GRUB must be installed on IDE HDD?> thanks!>> _________________________________________________________________> ??????????????? MSN Hotmail? http://www.hotmail.com

From mjk at sdsc.edu Wed Dec 10 22:04:57 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Wed, 10 Dec 2003 22:04:57 -0800Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distroIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Hi Vicky,

The following directory cannot resolve its symlinks anymore. If you start removing the profiles and mirror directories around Rocks cannot find them to build kickstart files.

-mjk

[root at rocks14 default]# ls -ltotal 16lrwxrwxrwx 1 root root 113 Nov 13 20:19 core.xml -> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/ 7.3/en/os/i386/build/graphs/default/core.xml-rwxrwsr-x 1 root wheel 3123 Sep 3 17:10 hpc.xml-rwxr-xr-x 1 root root 495 Sep 9 22:55 patch.xml-rwxrwsr-x 1 root wheel 452 Sep 3 17:10 root.xmllrwxrwxrwx 1 root root 112 Nov 13 20:19 rsh.xml -> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/ 7.3/en/os/i386/build/graphs/default/rsh.xml-rwxrwsr-x 1 root wheel 923 Sep 3 17:10 sge.xml

On Dec 10, 2003, at 7:52 PM, V. Rowley wrote:

> Looks like python is okay:>>> [root at rocks14 birn-oracle1]# which python>> /usr/bin/python>> [root at rocks14 birn-oracle1]# python --help>> Unknown option: -->> usage: python [option] ... [-c cmd | file | -] [arg] ...

Page 98: 2003 December

>> Options and arguments (and corresponding environment variables):>> -d : debug output from parser (also PYTHONDEBUG=x)>> -i : inspect interactively after running script, (also >> PYTHONINSPECT=x)>> and force prompts, even if stdin does not appear to be a >> terminal>> -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)>> -OO : remove doc-strings in addition to the -O optimizations>> -S : don't imply 'import site' on initialization>> -t : issue warnings about inconsistent tab usage (-tt: issue >> errors)>> -u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)>> -v : verbose (trace import statements) (also PYTHONVERBOSE=x)>> -x : skip first line of source, allowing use of non-Unix forms of >> #!cmd>> -X : disable class based built-in exceptions>> -c cmd : program passed in as string (terminates option list)>> file : program read from script file>> - : program read from stdin (default; interactive mode if a tty)>> arg ...: arguments passed to program in sys.argv[1:]>> Other environment variables:>> PYTHONSTARTUP: file executed on interactive startup (no default)>> PYTHONPATH : ':'-separated list of directories prefixed to the>> default module search path. The result is sys.path.>> PYTHONHOME : alternate <prefix> directory (or >> <prefix>:<exec_prefix>).>> The default module search path uses <prefix>/python1.5.>> [root at rocks14 birn-oracle1]#>>>> Tim Carlson wrote:>> On Wed, 10 Dec 2003, V. Rowley wrote:>> Did you remove python by chance? kickstart.cgi calls python directly >> in>> /usr/bin/python while rocks-dist does an "env python">> Tim>>> Yep, I did that, but only *AFTER* getting the error. [Thought it was>>> generated by the rocks-dist sequence, but apparently not.] Go ahead.>>> Move it back. Same difference.>>>>>> Vicky>>>>>> Mason J. Katz wrote:>>>>>>> It looks like someone moved the profiles directory to profiles.orig.>>>>>>>> -mjk>>>>>>>>>>>> [root at rocks14 install]# ls -l>>>> total 56>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom>>>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07>>>> ftp.rocksclusters.org>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38>>>> ftp.rocksclusters.org.orig>>>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40

Page 99: 2003 December

>>>> kickstart.cgi>>>> drwxr-xr-x 3 root root 4096 Dec 10 20:38 >>>> profiles.orig>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist>>>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 >>>> rocks-dist.orig>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src>>>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo>>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>>>>>>>>>>>>> When I run this:>>>>>>>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;>>>>> rocks-dist --dist=cdrom cdrom>>>>>>>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>>>>>>>>>>>>> Cleaning distribution>>>>>> Resolving versions (RPMs)>>>>>> Resolving versions (SRPMs)>>>>>> Adding support for rebuild distribution from source>>>>>> Creating files (symbolic links - fast)>>>>>> Creating symlinks to kickstart files>>>>>> Fixing Comps Database>>>>>> Generating hdlist (rpm database)>>>>>> Patching second stage loader (eKV, partioning, ...)>>>>>> patching "rocks-ekv" into distribution ...>>>>>> patching "rocks-piece-pipe" into distribution ...>>>>>> patching "PyXML" into distribution ...>>>>>> patching "expat" into distribution ...>>>>>> patching "rocks-pylib" into distribution ...>>>>>> patching "MySQL-python" into distribution ...>>>>>> patching "rocks-kickstart" into distribution ...>>>>>> patching "rocks-kickstart-profiles" into distribution ...>>>>>> patching "rocks-kickstart-dtds" into distribution ...>>>>>> building CRAM filesystem ...>>>>>> Cleaning distribution>>>>>> Resolving versions (RPMs)>>>>>> Resolving versions (SRPMs)>>>>>> Creating symlinks to kickstart files>>>>>> Generating hdlist (rpm database)>>>>>> Segregating RPMs (rocks, non-rocks)>>>>>> sh: ./kickstart.cgi: No such file or directory>>>>>> sh: ./kickstart.cgi: No such file or directory>>>>>> Traceback (innermost last):>>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>>>>> app.run()>>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>>>>> eval('self.command_%s()' % (command))>>>>>> File "<string>", line 0, in ?>>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>>>>> builder.build()>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>>>>> (rocks, nonrocks) = self.segregateRPMS()>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in>>>>>> segregateRPMS>>>>>> for pkg in ks.getSection('packages'):

Page 100: 2003 December

>>>>>> TypeError: loop over non-sequence>>>>>>>>>>>>>>> Any ideas?>>>>>>>>>> -->>>>> Vicky Rowley email: vrowley at ucsd.edu>>>>> Biomedical Informatics Research Network work: (858) 536-5980>>>>> University of California, San Diego fax: (858) 822-0828>>>>> 9500 Gilman Drive>>>>> La Jolla, CA 92093-0715>>>>>>>>>>>>>>> See pictures from our trip to China at >>>>> http://www.sagacitech.com/Chinaweb>>>>>>>>>>>>>>> -->>> Vicky Rowley email: vrowley at ucsd.edu>>> Biomedical Informatics Research Network work: (858) 536-5980>>> University of California, San Diego fax: (858) 822-0828>>> 9500 Gilman Drive>>> La Jolla, CA 92093-0715>>>>>>>>> See pictures from our trip to China at >>> http://www.sagacitech.com/Chinaweb>>>>>>>> -- > Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China at > http://www.sagacitech.com/Chinaweb

From bruno at rocksclusters.org Wed Dec 10 22:31:11 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 10 Dec 2003 22:31:11 -0800Subject: [Rocks-Discuss]Rocks 3.0.0In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> I am having a problem on install of rocks 3.0.0 on my new cluster.>> The python error occurs right after anaconda starts and just before > the install asks for the roll CDROM.>> The error refers to an inability to find or load rocks.file. The error > is associated I think with the window that pops up and asks you in put

Page 101: 2003 December

> the roll CDROM in.>> The process I followed to get to this point is>> Put the Rocks 3.0.0 CDROM into the CDROM drive> Boot the system> At the prompt type frontend> Wait till anaconda starts> Error referring to unable to load rocks.file.>> I have successfully installed rocks on a smaller cluster but that has > different hardware. I used the same CDROM for both installs.>> Any thoughts?

hard to say -- but some folks had similar problems due to bad memory:

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-February/ 001246.html

- gb

From vincent_b_fox at yahoo.com Wed Dec 10 22:43:21 2003From: vincent_b_fox at yahoo.com (Vincent Fox)Date: Wed, 10 Dec 2003 22:43:21 -0800 (PST)Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platformIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

Okay, here's the context diff as plain text. I test-applied it using "patch -p0 < atlas.patch" and did a compile on my PII box successfully. I can send it as attachment or submit to CVS or some other way if you need: *** atlas.spec.in.orig Thu Dec 11 06:27:13 2003--- atlas.spec.in Thu Dec 11 06:30:46 2003****************** 111,117 ****--- 111,133 ---- y " | make+ elif [ $CPUID -eq 4 ]+ then+ #+ # Pentium II+ #+ echo "0+ y+ y+ n+ y+ linux+ 0+ /usr/bin/g77+ -O+ y+ " | make else

Page 102: 2003 December

#

Greg Bruno <bruno at rocksclusters.org> wrote:> Okay, came up my own quick hack:>> Edit atlas.spec.in, go to "other x86" section, remove> 2 lines right above "linux", seems to make rpm now.>> A more formal patch would be put in a section for> cpuid eq 4 with this correction I suppose.

if you provide the patch, we'll include it in our next release.

- gb

---------------------------------Do you Yahoo!?New Yahoo! Photos - easier uploading and sharing-------------- next part --------------An HTML attachment was scrubbed...URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031210/be5c8b04/attachment-0001.html

From naihh at imcb.a-star.edu.sg Thu Dec 11 00:08:14 2003From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)Date: Thu, 11 Dec 2003 16:08:14 +0800Subject: [Rocks-Discuss]RE: Have anyone successfully build a set of grid compute nodes using Rocks?Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCDB9@EXIMCB2.imcb.a-star.edu.sg>

Hi,

Have anyone successfully build a set of grid compute nodes using Rocks3?Anyone care to share?

Nai Hong Hwa FrancisInstitute of Molecular and Cell Biology (A*STAR)30 Medical DriveSingapore 117609.DID: (65) 6874-6196

-----Original Message-----From: npaci-rocks-discussion-request at sdsc.edu[mailto:npaci-rocks-discussion-request at sdsc.edu] Sent: Thursday, December 11, 2003 11:54 AMTo: npaci-rocks-discussion at sdsc.eduSubject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs

Send npaci-rocks-discussion mailing list submissions tonpaci-rocks-discussion at sdsc.edu

To subscribe or unsubscribe via the World Wide Web, visit

Page 103: 2003 December

http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussionor, via email, send a message with subject or body 'help' to

npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list atnpaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specificthan "Re: Contents of npaci-rocks-discussion digest..."

Today's Topics:

1. RE: Do you have a list of the various models of Gigabit EthernetInterfaces compatible to Rocks 3? (Nai Hong Hwa Francis) 2. Rocks 3.0.0 (Terrence Martin) 3. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley)

--__--__--

Message: 1Date: Thu, 11 Dec 2003 09:45:18 +0800From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg>To: <npaci-rocks-discussion at sdsc.edu>Subject: [Rocks-Discuss]RE: Do you have a list of the various models ofGigabit Ethernet Interfaces compatible to Rocks 3?

Hi All,

Do you have a list of the various gigabit Ethernet interfaces that arecompatible to Rocks 3?

I am changing my nodes connectivity from 10/100 to 1000.

Have anyone done that and how are the differences in performance orturnaround time?

Thanks and Regards

Nai Hong Hwa FrancisInstitute of Molecular and Cell Biology (A*STAR)30 Medical DriveSingapore 117609.DID: (65) 6874-6196

-----Original Message-----From: npaci-rocks-discussion-request at sdsc.edu[mailto:npaci-rocks-discussion-request at sdsc.edu]=20Sent: Thursday, December 11, 2003 9:25 AMTo: npaci-rocks-discussion at sdsc.eduSubject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs

Send npaci-rocks-discussion mailing list submissions tonpaci-rocks-discussion at sdsc.edu

Page 104: 2003 December

To subscribe or unsubscribe via the World Wide Web, visit=09http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussionor, via email, send a message with subject or body 'help' to

npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list atnpaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specificthan "Re: Contents of npaci-rocks-discussion digest..."

Today's Topics:

1. Non-homogenous legacy hardware (Chris Dwan (CCGB)) 2. Error during Make when building a new install floppy (TerrenceMartin) 3. Re: Error during Make when building a new install floppy (TimCarlson) 4. Re: Non-homogenous legacy hardware (Tim Carlson) 5. ssh_known_hosts and ganglia (Jag) 6. Re: ssh_known_hosts and ganglia (Mason J. Katz) 7. "TypeError: loop over non-sequence" when trying to build CDdistro (V. Rowley) 8. Re: one node short in "labels" (Greg Bruno) 9. Re: "TypeError: loop over non-sequence" when trying to build CDdistro (Mason J. Katz) 10. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 11. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Tim Carlson)

-- __--__--

Message: 1Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>To: npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]Non-homogenous legacy hardware

I am integrating legacy systems into a ROCKS cluster, and have hit asnag with the auto-partition configuration: The new (old) systems haveSCSI disks, while old (new) ones contain IDE. This is a non-issue solong as the initial install does its default partitioning. However, Ihave a "replace-auto-partition.xml" file which is unworkable for theSCSIbased systems since it makes specific reference to "hda" rather than"sda."

I would like to have a site-nodes/replace-auto-partition.xml file with aconditional such that "hda" or "sda" is used, based on the name of thenode (or some other criterion).

Is this possible?

Thanks, in advance. If this is out there on the mailing list archives,

Page 105: 2003 December

apointer would be greatly appreciated.

-Chris Dwan The University of Minnesota

-- __--__--

Message: 2Date: Wed, 10 Dec 2003 12:09:11 -0800From: Terrence Martin <tmartin at physics.ucsd.edu>To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>Subject: [Rocks-Discuss]Error during Make when building a new installfloppy

I get the following error when I try to rebuild a boot floppy for rocks.

This is with the default CVS checkout with an update today accordingto=20the rocks userguide. I have not actually attempted to make any changes.

make[3]: Leaving directory=20`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'make[2]: Leaving directory=20`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'strip -o loader anaconda-7.3/loader/loaderstrip: anaconda-7.3/loader/loader: No such file or directorymake[1]: *** [loader] Error 1make[1]: Leaving directory`/home/install/rocks/src/rocks/boot/7.3/loader'make: *** [loader] Error 2

Of course I could avoid all of this together and just put my binary=20module into the appropriate location in the boot image.

Would it be correct to modify the following image file with mychanges=20and then write it to a floppy via dd?

/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3/en/os/i386/images/bootnet.img

Basically I am injecting an updated e1000 driver with changes to=20pcitable to support the address of my gigabit cards.

Terrence

-- __--__--

Message: 3Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]Error during Make when building a newinstall floppyTo: Terrence Martin <tmartin at physics.ucsd.edu>Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>Reply-to: Tim Carlson <tim.carlson at pnl.gov>

Page 106: 2003 December

On Wed, 10 Dec 2003, Terrence Martin wrote:

> I get the following error when I try to rebuild a boot floppy forrocks.>

You can't make a boot floppy with Rocks 3.0. That isn't supported. Or atleast it wasn't the last time I checked

> Of course I could avoid all of this together and just put my binary> module into the appropriate location in the boot image.>> Would it be correct to modify the following image file with my changes> and then write it to a floppy via dd?>>/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3/en/os/i386/images/bootnet.img>> Basically I am injecting an updated e1000 driver with changes to> pcitable to support the address of my gigabit cards.

Modifiying the bootnet.img is about 1/3 of what you need to do if you godown that path. You also need to work on netstg1.img and you'll need toupdate the drive in the kernel rpm that gets installed on the box. Noneofthis is trivial.

If it were me, I would go down the same path I took for updating theAIC79XX driver

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

-- __--__--

Message: 4Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardwareTo: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>Cc: npaci-rocks-discussion at sdsc.eduReply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:

>> I am integrating legacy systems into a ROCKS cluster, and have hit a> snag with the auto-partition configuration: The new (old) systemshave> SCSI disks, while old (new) ones contain IDE. This is a non-issue so

Page 107: 2003 December

> long as the initial install does its default partitioning. However, I> have a "replace-auto-partition.xml" file which is unworkable for theSCSI> based systems since it makes specific reference to "hda" rather than> "sda."

If you have just a single drive, then you should be able to skip the"--ondisk" bits of your "part" command

Otherwise, you would have first to do something ugly like the following:

http://penguin.epfl.ch/slides/kickstart/ks.cfg

You could probably (maybe) wrap most of that in an<eval sh=3D"bash"></eval>

block in the <main> block.

Just guessing.. haven't tried this.

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

-- __--__--

Message: 5From: Jag <agrajag at dragaera.net>To: npaci-rocks-discussion at sdsc.eduDate: Wed, 10 Dec 2003 13:21:07 -0500Subject: [Rocks-Discuss]ssh_known_hosts and ganglia

I noticed a previous post on this list(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934.html) indicating that Rocks distributes ssh keys for all the nodes overganglia. Can anyone enlighten me as to how this is done?

I looked through the ganglia docs and didn't see anything indicating howto do this, so I'm assuming Rocks made some changes. Unfortunately therocks iso images don't seem to contain srpms, so I'm now coming here.=20What did Rocks do to ganglia to make the distribution of ssh keys work?

Also, does anyone know where Rocks SRPMs can be found? I've done quitea bit of searching, but haven't found them anywhere.

-- __--__--

Message: 6Cc: npaci-rocks-discussion at sdsc.eduFrom: "Mason J. Katz" <mjk at sdsc.edu>Subject: Re: [Rocks-Discuss]ssh_known_hosts and gangliaDate: Wed, 10 Dec 2003 14:39:15 -0800To: Jag <agrajag at dragaera.net>

Page 108: 2003 December

Most of the SRPMS are on our FTP site, but we've screwed this up =20before. The SRPMS are entirely Rocks specific so they are of little =20value outside of Rocks. You can also checkout our CVS tree =20(cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We=20have a ganglia-python package we created to allow us to write our own=20metrics at a high level than the provide gmetric application. We've =20also moved from this method to a single cluster-wide ssh key for Rocks=203.1.

-mjk

On Dec 10, 2003, at 10:21 AM, Jag wrote:

> I noticed a previous post on this list> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20> 001934.html) indicating that Rocks distributes ssh keys for all the=20> nodes over> ganglia. Can anyone enlighten me as to how this is done?>> I looked through the ganglia docs and didn't see anything indicating=20> how> to do this, so I'm assuming Rocks made some changes. Unfortunatelythe> rocks iso images don't seem to contain srpms, so I'm now coming here.> What did Rocks do to ganglia to make the distribution of ssh keyswork?>> Also, does anyone know where Rocks SRPMs can be found? I've donequite> a bit of searching, but haven't found them anywhere.

-- __--__--

Message: 7Date: Wed, 10 Dec 2003 14:43:49 -0800From: "V. Rowley" <vrowley at ucsd.edu>To: npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]"TypeError: loop over non-sequence" when tryingto build CD distro

When I run this:

[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist

--dist=3Dcdrom cdrom

on a server installed with ROCKS 3.0.0, I eventually get this:

> Cleaning distribution> Resolving versions (RPMs)> Resolving versions (SRPMs)> Adding support for rebuild distribution from source

Page 109: 2003 December

> Creating files (symbolic links - fast)> Creating symlinks to kickstart files> Fixing Comps Database> Generating hdlist (rpm database)> Patching second stage loader (eKV, partioning, ...)> patching "rocks-ekv" into distribution ...> patching "rocks-piece-pipe" into distribution ...> patching "PyXML" into distribution ...> patching "expat" into distribution ...> patching "rocks-pylib" into distribution ...> patching "MySQL-python" into distribution ...> patching "rocks-kickstart" into distribution ...> patching "rocks-kickstart-profiles" into distribution ...> patching "rocks-kickstart-dtds" into distribution ...> building CRAM filesystem ...> Cleaning distribution> Resolving versions (RPMs)> Resolving versions (SRPMs)> Creating symlinks to kickstart files> Generating hdlist (rpm database)> Segregating RPMs (rocks, non-rocks)> sh: ./kickstart.cgi: No such file or directory> sh: ./kickstart.cgi: No such file or directory> Traceback (innermost last):> File "/opt/rocks/bin/rocks-dist", line 807, in ?> app.run()> File "/opt/rocks/bin/rocks-dist", line 623, in run> eval('self.command_%s()' % (command))> File "<string>", line 0, in ?> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom> builder.build()> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build> (rocks, nonrocks) =3D self.segregateRPMS()> File "/opt/rocks/lib/python/rocks/build.py", line 1107, insegregateRPMS> for pkg in ks.getSection('packages'):> TypeError: loop over non-sequence

Any ideas?

--=20Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

-- __--__--

Message: 8Cc: rocks <npaci-rocks-discussion at sdsc.edu>From: Greg Bruno <bruno at rocksclusters.org>Subject: Re: [Rocks-Discuss]one node short in "labels"Date: Wed, 10 Dec 2003 15:12:49 -0800

Page 110: 2003 December

To: Vincent Fox <vincent_b_fox at yahoo.com>

> So I go to the "labels" selection on the web page to print out =the=3D20> pretty labels. What a nice idea by the way!> =3DA0> EXCEPT....it's one node short! I go up to 0-13 and this stops at=3D20> 0-12.=3DA0 Any ideas where I should check to fix this?

yeah, we found this corner case -- it'll be fixed in the next release.

thanks for bug report.

- gb

-- __--__--

Message: 9Cc: npaci-rocks-discussion at sdsc.eduFrom: "Mason J. Katz" <mjk at sdsc.edu>Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distroDate: Wed, 10 Dec 2003 15:16:27 -0800To: "V. Rowley" <vrowley at ucsd.edu>

It looks like someone moved the profiles directory to profiles.orig.

-mjk

[root at rocks14 install]# ls -ltotal 56drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdromdrwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20ftp.rocksclusters.orgdrwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20ftp.rocksclusters.org.orig-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgidrwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-distdrwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:02 srcdrwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.fooOn Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

> When I run this:>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20> rocks-dist --dist=3Dcdrom cdrom>> on a server installed with ROCKS 3.0.0, I eventually get this:>>> Cleaning distribution>> Resolving versions (RPMs)>> Resolving versions (SRPMs)>> Adding support for rebuild distribution from source>> Creating files (symbolic links - fast)

Page 111: 2003 December

>> Creating symlinks to kickstart files>> Fixing Comps Database>> Generating hdlist (rpm database)>> Patching second stage loader (eKV, partioning, ...)>> patching "rocks-ekv" into distribution ...>> patching "rocks-piece-pipe" into distribution ...>> patching "PyXML" into distribution ...>> patching "expat" into distribution ...>> patching "rocks-pylib" into distribution ...>> patching "MySQL-python" into distribution ...>> patching "rocks-kickstart" into distribution ...>> patching "rocks-kickstart-profiles" into distribution ...>> patching "rocks-kickstart-dtds" into distribution ...>> building CRAM filesystem ...>> Cleaning distribution>> Resolving versions (RPMs)>> Resolving versions (SRPMs)>> Creating symlinks to kickstart files>> Generating hdlist (rpm database)>> Segregating RPMs (rocks, non-rocks)>> sh: ./kickstart.cgi: No such file or directory>> sh: ./kickstart.cgi: No such file or directory>> Traceback (innermost last):>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>> app.run()>> File "/opt/rocks/bin/rocks-dist", line 623, in run>> eval('self.command_%s()' % (command))>> File "<string>", line 0, in ?>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>> builder.build()>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>> (rocks, nonrocks) =3D self.segregateRPMS()>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20>> segregateRPMS>> for pkg in ks.getSection('packages'):>> TypeError: loop over non-sequence>> Any ideas?>> --=20> Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China at=20> http://www.sagacitech.com/Chinaweb

-- __--__--

Message: 10Date: Wed, 10 Dec 2003 16:50:16 -0800From: "V. Rowley" <vrowley at ucsd.edu>To: "Mason J. Katz" <mjk at sdsc.edu>CC: npaci-rocks-discussion at sdsc.eduSubject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when

Page 112: 2003 December

trying to build CD distro

Yep, I did that, but only *AFTER* getting the error. [Thought it was=20generated by the rocks-dist sequence, but apparently not.] Go ahead.=20Move it back. Same difference.

Vicky

Mason J. Katz wrote:> It looks like someone moved the profiles directory to profiles.orig.>=20> -mjk>=20>=20> [root at rocks14 install]# ls -l> total 56> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20> ftp.rocksclusters.org> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20> ftp.rocksclusters.org.orig> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi> drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38rocks-dist.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>=20>> When I run this:>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20>> rocks-dist --dist=3Dcdrom cdrom>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>> Cleaning distribution>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Adding support for rebuild distribution from source>>> Creating files (symbolic links - fast)>>> Creating symlinks to kickstart files>>> Fixing Comps Database>>> Generating hdlist (rpm database)>>> Patching second stage loader (eKV, partioning, ...)>>> patching "rocks-ekv" into distribution ...>>> patching "rocks-piece-pipe" into distribution ...>>> patching "PyXML" into distribution ...>>> patching "expat" into distribution ...>>> patching "rocks-pylib" into distribution ...>>> patching "MySQL-python" into distribution ...>>> patching "rocks-kickstart" into distribution ...>>> patching "rocks-kickstart-profiles" into distribution ...>>> patching "rocks-kickstart-dtds" into distribution ...>>> building CRAM filesystem ...>>> Cleaning distribution

Page 113: 2003 December

>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Creating symlinks to kickstart files>>> Generating hdlist (rpm database)>>> Segregating RPMs (rocks, non-rocks)>>> sh: ./kickstart.cgi: No such file or directory>>> sh: ./kickstart.cgi: No such file or directory>>> Traceback (innermost last):>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>> app.run()>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>> eval('self.command_%s()' % (command))>>> File "<string>", line 0, in ?>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>> builder.build()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>> (rocks, nonrocks) =3D self.segregateRPMS()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20>>> segregateRPMS>>> for pkg in ks.getSection('packages'):>>> TypeError: loop over non-sequence>>>>>> Any ideas?>>>> --=20>> Vicky Rowley email: vrowley at ucsd.edu>> Biomedical Informatics Research Network work: (858) 536-5980>> University of California, San Diego fax: (858) 822-0828>> 9500 Gilman Drive>> La Jolla, CA 92093-0715>>>>>> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb>=20>=20>=20

--=20Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

-- __--__--

Message: 11Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distro

Page 114: 2003 December

To: "V. Rowley" <vrowley at ucsd.edu>Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.eduReply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, V. Rowley wrote:

Did you remove python by chance? kickstart.cgi calls python directly in/usr/bin/python while rocks-dist does an "env python"

Tim

> Yep, I did that, but only *AFTER* getting the error. [Thought it was> generated by the rocks-dist sequence, but apparently not.] Go ahead.> Move it back. Same difference.>> Vicky>> Mason J. Katz wrote:> > It looks like someone moved the profiles directory to profiles.orig.> >> > -mjk> >> >> > [root at rocks14 install]# ls -l> > total 56> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom> > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07> > ftp.rocksclusters.org> > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38> > ftp.rocksclusters.org.orig> > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40kickstart.cgi> > drwxr-xr-x 3 root root 4096 Dec 10 20:38profiles.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist> > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38rocks-dist.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src> > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo> > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:> >> >> When I run this:> >>> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;> >> rocks-dist --dist=3Dcdrom cdrom> >>> >> on a server installed with ROCKS 3.0.0, I eventually get this:> >>> >>> Cleaning distribution> >>> Resolving versions (RPMs)> >>> Resolving versions (SRPMs)> >>> Adding support for rebuild distribution from source> >>> Creating files (symbolic links - fast)> >>> Creating symlinks to kickstart files> >>> Fixing Comps Database> >>> Generating hdlist (rpm database)> >>> Patching second stage loader (eKV, partioning, ...)> >>> patching "rocks-ekv" into distribution ...

Page 115: 2003 December

> >>> patching "rocks-piece-pipe" into distribution ...> >>> patching "PyXML" into distribution ...> >>> patching "expat" into distribution ...> >>> patching "rocks-pylib" into distribution ...> >>> patching "MySQL-python" into distribution ...> >>> patching "rocks-kickstart" into distribution ...> >>> patching "rocks-kickstart-profiles" into distribution ...> >>> patching "rocks-kickstart-dtds" into distribution ...> >>> building CRAM filesystem ...> >>> Cleaning distribution> >>> Resolving versions (RPMs)> >>> Resolving versions (SRPMs)> >>> Creating symlinks to kickstart files> >>> Generating hdlist (rpm database)> >>> Segregating RPMs (rocks, non-rocks)> >>> sh: ./kickstart.cgi: No such file or directory> >>> sh: ./kickstart.cgi: No such file or directory> >>> Traceback (innermost last):> >>> File "/opt/rocks/bin/rocks-dist", line 807, in ?> >>> app.run()> >>> File "/opt/rocks/bin/rocks-dist", line 623, in run> >>> eval('self.command_%s()' % (command))> >>> File "<string>", line 0, in ?> >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom> >>> builder.build()> >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build> >>> (rocks, nonrocks) =3D self.segregateRPMS()> >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in> >>> segregateRPMS> >>> for pkg in ks.getSection('packages'):> >>> TypeError: loop over non-sequence> >>> >>> >> Any ideas?> >>> >> --> >> Vicky Rowley email: vrowley at ucsd.edu> >> Biomedical Informatics Research Network work: (858) 536-5980> >> University of California, San Diego fax: (858) 822-0828> >> 9500 Gilman Drive> >> La Jolla, CA 92093-0715> >>> >>> >> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb> >> >> >>> --> Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

Page 116: 2003 December

>>

-- __--__--

_______________________________________________npaci-rocks-discussion mailing listnpaci-rocks-discussion at sdsc.eduhttp://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

End of npaci-rocks-discussion Digest

DISCLAIMER:This email is confidential and may be privileged. If you are not the =intended recipient, please delete it and notify us immediately. Please =do not copy or use it for any purpose, or disclose its contents to any =other person as it may be an offence under the Official Secrets Act. =Thank you.

--__--__--

Message: 2Date: Wed, 10 Dec 2003 18:03:41 -0800From: Terrence Martin <tmartin at physics.ucsd.edu>To: npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]Rocks 3.0.0

I am having a problem on install of rocks 3.0.0 on my new cluster.

The python error occurs right after anaconda starts and just before the install asks for the roll CDROM.

The error refers to an inability to find or load rocks.file. The error is associated I think with the window that pops up and asks you in put the roll CDROM in.

The process I followed to get to this point is

Put the Rocks 3.0.0 CDROM into the CDROM driveBoot the systemAt the prompt type frontendWait till anaconda startsError referring to unable to load rocks.file.

I have successfully installed rocks on a smaller cluster but that has different hardware. I used the same CDROM for both installs.

Any thoughts?

Terrence

--__--__--

Page 117: 2003 December

Message: 3Date: Wed, 10 Dec 2003 19:52:49 -0800From: "V. Rowley" <vrowley at ucsd.edu>To: npaci-rocks-discussion at sdsc.eduSubject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distro

Looks like python is okay:

> [root at rocks14 birn-oracle1]# which python> /usr/bin/python> [root at rocks14 birn-oracle1]# python --help> Unknown option: --> usage: python [option] ... [-c cmd | file | -] [arg] ...> Options and arguments (and corresponding environment variables):> -d : debug output from parser (also PYTHONDEBUG=x)> -i : inspect interactively after running script, (alsoPYTHONINSPECT=x)> and force prompts, even if stdin does not appear to be aterminal> -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)> -OO : remove doc-strings in addition to the -O optimizations> -S : don't imply 'import site' on initialization> -t : issue warnings about inconsistent tab usage (-tt: issueerrors)> -u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)> -v : verbose (trace import statements) (also PYTHONVERBOSE=x)> -x : skip first line of source, allowing use of non-Unix forms of#!cmd> -X : disable class based built-in exceptions> -c cmd : program passed in as string (terminates option list)> file : program read from script file> - : program read from stdin (default; interactive mode if a tty)> arg ...: arguments passed to program in sys.argv[1:]> Other environment variables:> PYTHONSTARTUP: file executed on interactive startup (no default)> PYTHONPATH : ':'-separated list of directories prefixed to the> default module search path. The result is sys.path.> PYTHONHOME : alternate <prefix> directory (or<prefix>:<exec_prefix>).> The default module search path uses <prefix>/python1.5.> [root at rocks14 birn-oracle1]#

Tim Carlson wrote:> On Wed, 10 Dec 2003, V. Rowley wrote:> > Did you remove python by chance? kickstart.cgi calls python directlyin> /usr/bin/python while rocks-dist does an "env python"> > Tim> > >>Yep, I did that, but only *AFTER* getting the error. [Thought it was>>generated by the rocks-dist sequence, but apparently not.] Go ahead.

Page 118: 2003 December

>>Move it back. Same difference.>>>>Vicky>>>>Mason J. Katz wrote:>>>>>It looks like someone moved the profiles directory to profiles.orig.>>>>>> -mjk>>>>>>>>>[root at rocks14 install]# ls -l>>>total 56>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom>>>drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:07>>>ftp.rocksclusters.org>>>drwxr-sr-x 3 root wheel 4096 Dec 10 20:38>>>ftp.rocksclusters.org.orig>>>-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi>>>drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist>>>drwxrwsr-x 3 root wheel 4096 Dec 10 20:38rocks-dist.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src>>>drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo>>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>>>>>>>>>>When I run this:>>>>>>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;>>>>rocks-dist --dist=cdrom cdrom>>>>>>>>on a server installed with ROCKS 3.0.0, I eventually get this:>>>>>>>>>>>>>Cleaning distribution>>>>>Resolving versions (RPMs)>>>>>Resolving versions (SRPMs)>>>>>Adding support for rebuild distribution from source>>>>>Creating files (symbolic links - fast)>>>>>Creating symlinks to kickstart files>>>>>Fixing Comps Database>>>>>Generating hdlist (rpm database)>>>>>Patching second stage loader (eKV, partioning, ...)>>>>> patching "rocks-ekv" into distribution ...>>>>> patching "rocks-piece-pipe" into distribution ...>>>>> patching "PyXML" into distribution ...>>>>> patching "expat" into distribution ...>>>>> patching "rocks-pylib" into distribution ...>>>>> patching "MySQL-python" into distribution ...>>>>> patching "rocks-kickstart" into distribution ...>>>>> patching "rocks-kickstart-profiles" into distribution ...>>>>> patching "rocks-kickstart-dtds" into distribution ...>>>>> building CRAM filesystem ...>>>>>Cleaning distribution>>>>>Resolving versions (RPMs)>>>>>Resolving versions (SRPMs)

Page 119: 2003 December

>>>>>Creating symlinks to kickstart files>>>>>Generating hdlist (rpm database)>>>>>Segregating RPMs (rocks, non-rocks)>>>>>sh: ./kickstart.cgi: No such file or directory>>>>>sh: ./kickstart.cgi: No such file or directory>>>>>Traceback (innermost last):>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>>>> app.run()>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>>>> eval('self.command_%s()' % (command))>>>>> File "<string>", line 0, in ?>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>>>> builder.build()>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>>>> (rocks, nonrocks) = self.segregateRPMS()>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in>>>>>segregateRPMS>>>>> for pkg in ks.getSection('packages'):>>>>>TypeError: loop over non-sequence>>>>>>>>>>>>Any ideas?>>>>>>>>-->>>>Vicky Rowley email: vrowley at ucsd.edu>>>>Biomedical Informatics Research Network work: (858) 536-5980>>>>University of California, San Diego fax: (858) 822-0828>>>>9500 Gilman Drive>>>>La Jolla, CA 92093-0715>>>>>>>>>>>>See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb>>>>>>>>>>>-->>Vicky Rowley email: vrowley at ucsd.edu>>Biomedical Informatics Research Network work: (858) 536-5980>>University of California, San Diego fax: (858) 822-0828>>9500 Gilman Drive>>La Jolla, CA 92093-0715>>>>>>See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb>>>>> > > >

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

Page 120: 2003 December

See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

--__--__--

_______________________________________________npaci-rocks-discussion mailing listnpaci-rocks-discussion at sdsc.eduhttp://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

End of npaci-rocks-discussion Digest

DISCLAIMER:This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you.

From naihh at imcb.a-star.edu.sg Thu Dec 11 00:09:34 2003From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)Date: Thu, 11 Dec 2003 16:09:34 +0800Subject: [Rocks-Discuss]RE: Install rocks on Titan64 Superblade Classic with Dual Opteron 244Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCDBA@EXIMCB2.imcb.a-star.edu.sg>

Hi,

Has anyone successfully install rocks on Titan64 Superblade Classic withDual Opteron 244?

Nai Hong Hwa FrancisInstitute of Molecular and Cell Biology (A*STAR)30 Medical DriveSingapore 117609.DID: (65) 6874-6196

-----Original Message-----From: npaci-rocks-discussion-request at sdsc.edu[mailto:npaci-rocks-discussion-request at sdsc.edu] Sent: Thursday, December 11, 2003 11:54 AMTo: npaci-rocks-discussion at sdsc.eduSubject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs

Send npaci-rocks-discussion mailing list submissions tonpaci-rocks-discussion at sdsc.edu

To subscribe or unsubscribe via the World Wide Web, visit

http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussionor, via email, send a message with subject or body 'help' to

Page 121: 2003 December

npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list atnpaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specificthan "Re: Contents of npaci-rocks-discussion digest..."

Today's Topics:

1. RE: Do you have a list of the various models of Gigabit EthernetInterfaces compatible to Rocks 3? (Nai Hong Hwa Francis) 2. Rocks 3.0.0 (Terrence Martin) 3. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley)

--__--__--

Message: 1Date: Thu, 11 Dec 2003 09:45:18 +0800From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg>To: <npaci-rocks-discussion at sdsc.edu>Subject: [Rocks-Discuss]RE: Do you have a list of the various models ofGigabit Ethernet Interfaces compatible to Rocks 3?

Hi All,

Do you have a list of the various gigabit Ethernet interfaces that arecompatible to Rocks 3?

I am changing my nodes connectivity from 10/100 to 1000.

Have anyone done that and how are the differences in performance orturnaround time?

Have anyone successfully build a set of grid compute nodes using Rocks3?

Thanks and Regards

Nai Hong Hwa FrancisInstitute of Molecular and Cell Biology (A*STAR)30 Medical DriveSingapore 117609.DID: (65) 6874-6196

-----Original Message-----From: npaci-rocks-discussion-request at sdsc.edu[mailto:npaci-rocks-discussion-request at sdsc.edu]=20Sent: Thursday, December 11, 2003 9:25 AMTo: npaci-rocks-discussion at sdsc.eduSubject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs

Send npaci-rocks-discussion mailing list submissions tonpaci-rocks-discussion at sdsc.edu

Page 122: 2003 December

To subscribe or unsubscribe via the World Wide Web, visit=09http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussionor, via email, send a message with subject or body 'help' to

npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list atnpaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specificthan "Re: Contents of npaci-rocks-discussion digest..."

Today's Topics:

1. Non-homogenous legacy hardware (Chris Dwan (CCGB)) 2. Error during Make when building a new install floppy (TerrenceMartin) 3. Re: Error during Make when building a new install floppy (TimCarlson) 4. Re: Non-homogenous legacy hardware (Tim Carlson) 5. ssh_known_hosts and ganglia (Jag) 6. Re: ssh_known_hosts and ganglia (Mason J. Katz) 7. "TypeError: loop over non-sequence" when trying to build CDdistro (V. Rowley) 8. Re: one node short in "labels" (Greg Bruno) 9. Re: "TypeError: loop over non-sequence" when trying to build CDdistro (Mason J. Katz) 10. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 11. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Tim Carlson)

-- __--__--

Message: 1Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>To: npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]Non-homogenous legacy hardware

I am integrating legacy systems into a ROCKS cluster, and have hit asnag with the auto-partition configuration: The new (old) systems haveSCSI disks, while old (new) ones contain IDE. This is a non-issue solong as the initial install does its default partitioning. However, Ihave a "replace-auto-partition.xml" file which is unworkable for theSCSIbased systems since it makes specific reference to "hda" rather than"sda."

I would like to have a site-nodes/replace-auto-partition.xml file with aconditional such that "hda" or "sda" is used, based on the name of thenode (or some other criterion).

Is this possible?

Thanks, in advance. If this is out there on the mailing list archives,

Page 123: 2003 December

apointer would be greatly appreciated.

-Chris Dwan The University of Minnesota

-- __--__--

Message: 2Date: Wed, 10 Dec 2003 12:09:11 -0800From: Terrence Martin <tmartin at physics.ucsd.edu>To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>Subject: [Rocks-Discuss]Error during Make when building a new installfloppy

I get the following error when I try to rebuild a boot floppy for rocks.

This is with the default CVS checkout with an update today accordingto=20the rocks userguide. I have not actually attempted to make any changes.

make[3]: Leaving directory=20`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'make[2]: Leaving directory=20`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'strip -o loader anaconda-7.3/loader/loaderstrip: anaconda-7.3/loader/loader: No such file or directorymake[1]: *** [loader] Error 1make[1]: Leaving directory`/home/install/rocks/src/rocks/boot/7.3/loader'make: *** [loader] Error 2

Of course I could avoid all of this together and just put my binary=20module into the appropriate location in the boot image.

Would it be correct to modify the following image file with mychanges=20and then write it to a floppy via dd?

/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3/en/os/i386/images/bootnet.img

Basically I am injecting an updated e1000 driver with changes to=20pcitable to support the address of my gigabit cards.

Terrence

-- __--__--

Message: 3Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]Error during Make when building a newinstall floppyTo: Terrence Martin <tmartin at physics.ucsd.edu>Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>Reply-to: Tim Carlson <tim.carlson at pnl.gov>

Page 124: 2003 December

On Wed, 10 Dec 2003, Terrence Martin wrote:

> I get the following error when I try to rebuild a boot floppy forrocks.>

You can't make a boot floppy with Rocks 3.0. That isn't supported. Or atleast it wasn't the last time I checked

> Of course I could avoid all of this together and just put my binary> module into the appropriate location in the boot image.>> Would it be correct to modify the following image file with my changes> and then write it to a floppy via dd?>>/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3/en/os/i386/images/bootnet.img>> Basically I am injecting an updated e1000 driver with changes to> pcitable to support the address of my gigabit cards.

Modifiying the bootnet.img is about 1/3 of what you need to do if you godown that path. You also need to work on netstg1.img and you'll need toupdate the drive in the kernel rpm that gets installed on the box. Noneofthis is trivial.

If it were me, I would go down the same path I took for updating theAIC79XX driver

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

-- __--__--

Message: 4Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardwareTo: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>Cc: npaci-rocks-discussion at sdsc.eduReply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:

>> I am integrating legacy systems into a ROCKS cluster, and have hit a> snag with the auto-partition configuration: The new (old) systemshave> SCSI disks, while old (new) ones contain IDE. This is a non-issue so

Page 125: 2003 December

> long as the initial install does its default partitioning. However, I> have a "replace-auto-partition.xml" file which is unworkable for theSCSI> based systems since it makes specific reference to "hda" rather than> "sda."

If you have just a single drive, then you should be able to skip the"--ondisk" bits of your "part" command

Otherwise, you would have first to do something ugly like the following:

http://penguin.epfl.ch/slides/kickstart/ks.cfg

You could probably (maybe) wrap most of that in an<eval sh=3D"bash"></eval>

block in the <main> block.

Just guessing.. haven't tried this.

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

-- __--__--

Message: 5From: Jag <agrajag at dragaera.net>To: npaci-rocks-discussion at sdsc.eduDate: Wed, 10 Dec 2003 13:21:07 -0500Subject: [Rocks-Discuss]ssh_known_hosts and ganglia

I noticed a previous post on this list(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934.html) indicating that Rocks distributes ssh keys for all the nodes overganglia. Can anyone enlighten me as to how this is done?

I looked through the ganglia docs and didn't see anything indicating howto do this, so I'm assuming Rocks made some changes. Unfortunately therocks iso images don't seem to contain srpms, so I'm now coming here.=20What did Rocks do to ganglia to make the distribution of ssh keys work?

Also, does anyone know where Rocks SRPMs can be found? I've done quitea bit of searching, but haven't found them anywhere.

-- __--__--

Message: 6Cc: npaci-rocks-discussion at sdsc.eduFrom: "Mason J. Katz" <mjk at sdsc.edu>Subject: Re: [Rocks-Discuss]ssh_known_hosts and gangliaDate: Wed, 10 Dec 2003 14:39:15 -0800To: Jag <agrajag at dragaera.net>

Page 126: 2003 December

Most of the SRPMS are on our FTP site, but we've screwed this up =20before. The SRPMS are entirely Rocks specific so they are of little =20value outside of Rocks. You can also checkout our CVS tree =20(cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We=20have a ganglia-python package we created to allow us to write our own=20metrics at a high level than the provide gmetric application. We've =20also moved from this method to a single cluster-wide ssh key for Rocks=203.1.

-mjk

On Dec 10, 2003, at 10:21 AM, Jag wrote:

> I noticed a previous post on this list> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20> 001934.html) indicating that Rocks distributes ssh keys for all the=20> nodes over> ganglia. Can anyone enlighten me as to how this is done?>> I looked through the ganglia docs and didn't see anything indicating=20> how> to do this, so I'm assuming Rocks made some changes. Unfortunatelythe> rocks iso images don't seem to contain srpms, so I'm now coming here.> What did Rocks do to ganglia to make the distribution of ssh keyswork?>> Also, does anyone know where Rocks SRPMs can be found? I've donequite> a bit of searching, but haven't found them anywhere.

-- __--__--

Message: 7Date: Wed, 10 Dec 2003 14:43:49 -0800From: "V. Rowley" <vrowley at ucsd.edu>To: npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]"TypeError: loop over non-sequence" when tryingto build CD distro

When I run this:

[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist

--dist=3Dcdrom cdrom

on a server installed with ROCKS 3.0.0, I eventually get this:

> Cleaning distribution> Resolving versions (RPMs)> Resolving versions (SRPMs)> Adding support for rebuild distribution from source

Page 127: 2003 December

> Creating files (symbolic links - fast)> Creating symlinks to kickstart files> Fixing Comps Database> Generating hdlist (rpm database)> Patching second stage loader (eKV, partioning, ...)> patching "rocks-ekv" into distribution ...> patching "rocks-piece-pipe" into distribution ...> patching "PyXML" into distribution ...> patching "expat" into distribution ...> patching "rocks-pylib" into distribution ...> patching "MySQL-python" into distribution ...> patching "rocks-kickstart" into distribution ...> patching "rocks-kickstart-profiles" into distribution ...> patching "rocks-kickstart-dtds" into distribution ...> building CRAM filesystem ...> Cleaning distribution> Resolving versions (RPMs)> Resolving versions (SRPMs)> Creating symlinks to kickstart files> Generating hdlist (rpm database)> Segregating RPMs (rocks, non-rocks)> sh: ./kickstart.cgi: No such file or directory> sh: ./kickstart.cgi: No such file or directory> Traceback (innermost last):> File "/opt/rocks/bin/rocks-dist", line 807, in ?> app.run()> File "/opt/rocks/bin/rocks-dist", line 623, in run> eval('self.command_%s()' % (command))> File "<string>", line 0, in ?> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom> builder.build()> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build> (rocks, nonrocks) =3D self.segregateRPMS()> File "/opt/rocks/lib/python/rocks/build.py", line 1107, insegregateRPMS> for pkg in ks.getSection('packages'):> TypeError: loop over non-sequence

Any ideas?

--=20Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

-- __--__--

Message: 8Cc: rocks <npaci-rocks-discussion at sdsc.edu>From: Greg Bruno <bruno at rocksclusters.org>Subject: Re: [Rocks-Discuss]one node short in "labels"Date: Wed, 10 Dec 2003 15:12:49 -0800

Page 128: 2003 December

To: Vincent Fox <vincent_b_fox at yahoo.com>

> So I go to the "labels" selection on the web page to print out =the=3D20> pretty labels. What a nice idea by the way!> =3DA0> EXCEPT....it's one node short! I go up to 0-13 and this stops at=3D20> 0-12.=3DA0 Any ideas where I should check to fix this?

yeah, we found this corner case -- it'll be fixed in the next release.

thanks for bug report.

- gb

-- __--__--

Message: 9Cc: npaci-rocks-discussion at sdsc.eduFrom: "Mason J. Katz" <mjk at sdsc.edu>Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distroDate: Wed, 10 Dec 2003 15:16:27 -0800To: "V. Rowley" <vrowley at ucsd.edu>

It looks like someone moved the profiles directory to profiles.orig.

-mjk

[root at rocks14 install]# ls -ltotal 56drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdromdrwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20ftp.rocksclusters.orgdrwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20ftp.rocksclusters.org.orig-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgidrwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-distdrwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.origdrwxr-sr-x 3 root wheel 4096 Dec 10 21:02 srcdrwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.fooOn Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

> When I run this:>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20> rocks-dist --dist=3Dcdrom cdrom>> on a server installed with ROCKS 3.0.0, I eventually get this:>>> Cleaning distribution>> Resolving versions (RPMs)>> Resolving versions (SRPMs)>> Adding support for rebuild distribution from source>> Creating files (symbolic links - fast)

Page 129: 2003 December

>> Creating symlinks to kickstart files>> Fixing Comps Database>> Generating hdlist (rpm database)>> Patching second stage loader (eKV, partioning, ...)>> patching "rocks-ekv" into distribution ...>> patching "rocks-piece-pipe" into distribution ...>> patching "PyXML" into distribution ...>> patching "expat" into distribution ...>> patching "rocks-pylib" into distribution ...>> patching "MySQL-python" into distribution ...>> patching "rocks-kickstart" into distribution ...>> patching "rocks-kickstart-profiles" into distribution ...>> patching "rocks-kickstart-dtds" into distribution ...>> building CRAM filesystem ...>> Cleaning distribution>> Resolving versions (RPMs)>> Resolving versions (SRPMs)>> Creating symlinks to kickstart files>> Generating hdlist (rpm database)>> Segregating RPMs (rocks, non-rocks)>> sh: ./kickstart.cgi: No such file or directory>> sh: ./kickstart.cgi: No such file or directory>> Traceback (innermost last):>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>> app.run()>> File "/opt/rocks/bin/rocks-dist", line 623, in run>> eval('self.command_%s()' % (command))>> File "<string>", line 0, in ?>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>> builder.build()>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>> (rocks, nonrocks) =3D self.segregateRPMS()>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20>> segregateRPMS>> for pkg in ks.getSection('packages'):>> TypeError: loop over non-sequence>> Any ideas?>> --=20> Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China at=20> http://www.sagacitech.com/Chinaweb

-- __--__--

Message: 10Date: Wed, 10 Dec 2003 16:50:16 -0800From: "V. Rowley" <vrowley at ucsd.edu>To: "Mason J. Katz" <mjk at sdsc.edu>CC: npaci-rocks-discussion at sdsc.eduSubject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when

Page 130: 2003 December

trying to build CD distro

Yep, I did that, but only *AFTER* getting the error. [Thought it was=20generated by the rocks-dist sequence, but apparently not.] Go ahead.=20Move it back. Same difference.

Vicky

Mason J. Katz wrote:> It looks like someone moved the profiles directory to profiles.orig.>=20> -mjk>=20>=20> [root at rocks14 install]# ls -l> total 56> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20> ftp.rocksclusters.org> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20> ftp.rocksclusters.org.orig> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi> drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38rocks-dist.orig> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>=20>> When I run this:>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20>> rocks-dist --dist=3Dcdrom cdrom>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>> Cleaning distribution>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Adding support for rebuild distribution from source>>> Creating files (symbolic links - fast)>>> Creating symlinks to kickstart files>>> Fixing Comps Database>>> Generating hdlist (rpm database)>>> Patching second stage loader (eKV, partioning, ...)>>> patching "rocks-ekv" into distribution ...>>> patching "rocks-piece-pipe" into distribution ...>>> patching "PyXML" into distribution ...>>> patching "expat" into distribution ...>>> patching "rocks-pylib" into distribution ...>>> patching "MySQL-python" into distribution ...>>> patching "rocks-kickstart" into distribution ...>>> patching "rocks-kickstart-profiles" into distribution ...>>> patching "rocks-kickstart-dtds" into distribution ...>>> building CRAM filesystem ...>>> Cleaning distribution

Page 131: 2003 December

>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Creating symlinks to kickstart files>>> Generating hdlist (rpm database)>>> Segregating RPMs (rocks, non-rocks)>>> sh: ./kickstart.cgi: No such file or directory>>> sh: ./kickstart.cgi: No such file or directory>>> Traceback (innermost last):>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>> app.run()>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>> eval('self.command_%s()' % (command))>>> File "<string>", line 0, in ?>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>> builder.build()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>> (rocks, nonrocks) =3D self.segregateRPMS()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20>>> segregateRPMS>>> for pkg in ks.getSection('packages'):>>> TypeError: loop over non-sequence>>>>>> Any ideas?>>>> --=20>> Vicky Rowley email: vrowley at ucsd.edu>> Biomedical Informatics Research Network work: (858) 536-5980>> University of California, San Diego fax: (858) 822-0828>> 9500 Gilman Drive>> La Jolla, CA 92093-0715>>>>>> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb>=20>=20>=20

--=20Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

-- __--__--

Message: 11Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)From: Tim Carlson <tim.carlson at pnl.gov>Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distro

Page 132: 2003 December

To: "V. Rowley" <vrowley at ucsd.edu>Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.eduReply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, V. Rowley wrote:

Did you remove python by chance? kickstart.cgi calls python directly in/usr/bin/python while rocks-dist does an "env python"

Tim

> Yep, I did that, but only *AFTER* getting the error. [Thought it was> generated by the rocks-dist sequence, but apparently not.] Go ahead.> Move it back. Same difference.>> Vicky>> Mason J. Katz wrote:> > It looks like someone moved the profiles directory to profiles.orig.> >> > -mjk> >> >> > [root at rocks14 install]# ls -l> > total 56> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom> > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07> > ftp.rocksclusters.org> > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38> > ftp.rocksclusters.org.orig> > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40kickstart.cgi> > drwxr-xr-x 3 root root 4096 Dec 10 20:38profiles.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist> > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38rocks-dist.orig> > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src> > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo> > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:> >> >> When I run this:> >>> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;> >> rocks-dist --dist=3Dcdrom cdrom> >>> >> on a server installed with ROCKS 3.0.0, I eventually get this:> >>> >>> Cleaning distribution> >>> Resolving versions (RPMs)> >>> Resolving versions (SRPMs)> >>> Adding support for rebuild distribution from source> >>> Creating files (symbolic links - fast)> >>> Creating symlinks to kickstart files> >>> Fixing Comps Database> >>> Generating hdlist (rpm database)> >>> Patching second stage loader (eKV, partioning, ...)> >>> patching "rocks-ekv" into distribution ...

Page 133: 2003 December

> >>> patching "rocks-piece-pipe" into distribution ...> >>> patching "PyXML" into distribution ...> >>> patching "expat" into distribution ...> >>> patching "rocks-pylib" into distribution ...> >>> patching "MySQL-python" into distribution ...> >>> patching "rocks-kickstart" into distribution ...> >>> patching "rocks-kickstart-profiles" into distribution ...> >>> patching "rocks-kickstart-dtds" into distribution ...> >>> building CRAM filesystem ...> >>> Cleaning distribution> >>> Resolving versions (RPMs)> >>> Resolving versions (SRPMs)> >>> Creating symlinks to kickstart files> >>> Generating hdlist (rpm database)> >>> Segregating RPMs (rocks, non-rocks)> >>> sh: ./kickstart.cgi: No such file or directory> >>> sh: ./kickstart.cgi: No such file or directory> >>> Traceback (innermost last):> >>> File "/opt/rocks/bin/rocks-dist", line 807, in ?> >>> app.run()> >>> File "/opt/rocks/bin/rocks-dist", line 623, in run> >>> eval('self.command_%s()' % (command))> >>> File "<string>", line 0, in ?> >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom> >>> builder.build()> >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build> >>> (rocks, nonrocks) =3D self.segregateRPMS()> >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in> >>> segregateRPMS> >>> for pkg in ks.getSection('packages'):> >>> TypeError: loop over non-sequence> >>> >>> >> Any ideas?> >>> >> --> >> Vicky Rowley email: vrowley at ucsd.edu> >> Biomedical Informatics Research Network work: (858) 536-5980> >> University of California, San Diego fax: (858) 822-0828> >> 9500 Gilman Drive> >> La Jolla, CA 92093-0715> >>> >>> >> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb> >> >> >>> --> Vicky Rowley email: vrowley at ucsd.edu> Biomedical Informatics Research Network work: (858) 536-5980> University of California, San Diego fax: (858) 822-0828> 9500 Gilman Drive> La Jolla, CA 92093-0715>>> See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

Page 134: 2003 December

>>

-- __--__--

_______________________________________________npaci-rocks-discussion mailing listnpaci-rocks-discussion at sdsc.eduhttp://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

End of npaci-rocks-discussion Digest

DISCLAIMER:This email is confidential and may be privileged. If you are not the =intended recipient, please delete it and notify us immediately. Please =do not copy or use it for any purpose, or disclose its contents to any =other person as it may be an offence under the Official Secrets Act. =Thank you.

--__--__--

Message: 2Date: Wed, 10 Dec 2003 18:03:41 -0800From: Terrence Martin <tmartin at physics.ucsd.edu>To: npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]Rocks 3.0.0

I am having a problem on install of rocks 3.0.0 on my new cluster.

The python error occurs right after anaconda starts and just before the install asks for the roll CDROM.

The error refers to an inability to find or load rocks.file. The error is associated I think with the window that pops up and asks you in put the roll CDROM in.

The process I followed to get to this point is

Put the Rocks 3.0.0 CDROM into the CDROM driveBoot the systemAt the prompt type frontendWait till anaconda startsError referring to unable to load rocks.file.

I have successfully installed rocks on a smaller cluster but that has different hardware. I used the same CDROM for both installs.

Any thoughts?

Terrence

--__--__--

Page 135: 2003 December

Message: 3Date: Wed, 10 Dec 2003 19:52:49 -0800From: "V. Rowley" <vrowley at ucsd.edu>To: npaci-rocks-discussion at sdsc.eduSubject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" whentrying to build CD distro

Looks like python is okay:

> [root at rocks14 birn-oracle1]# which python> /usr/bin/python> [root at rocks14 birn-oracle1]# python --help> Unknown option: --> usage: python [option] ... [-c cmd | file | -] [arg] ...> Options and arguments (and corresponding environment variables):> -d : debug output from parser (also PYTHONDEBUG=x)> -i : inspect interactively after running script, (alsoPYTHONINSPECT=x)> and force prompts, even if stdin does not appear to be aterminal> -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)> -OO : remove doc-strings in addition to the -O optimizations> -S : don't imply 'import site' on initialization> -t : issue warnings about inconsistent tab usage (-tt: issueerrors)> -u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)> -v : verbose (trace import statements) (also PYTHONVERBOSE=x)> -x : skip first line of source, allowing use of non-Unix forms of#!cmd> -X : disable class based built-in exceptions> -c cmd : program passed in as string (terminates option list)> file : program read from script file> - : program read from stdin (default; interactive mode if a tty)> arg ...: arguments passed to program in sys.argv[1:]> Other environment variables:> PYTHONSTARTUP: file executed on interactive startup (no default)> PYTHONPATH : ':'-separated list of directories prefixed to the> default module search path. The result is sys.path.> PYTHONHOME : alternate <prefix> directory (or<prefix>:<exec_prefix>).> The default module search path uses <prefix>/python1.5.> [root at rocks14 birn-oracle1]#

Tim Carlson wrote:> On Wed, 10 Dec 2003, V. Rowley wrote:> > Did you remove python by chance? kickstart.cgi calls python directlyin> /usr/bin/python while rocks-dist does an "env python"> > Tim> > >>Yep, I did that, but only *AFTER* getting the error. [Thought it was>>generated by the rocks-dist sequence, but apparently not.] Go ahead.

Page 136: 2003 December

>>Move it back. Same difference.>>>>Vicky>>>>Mason J. Katz wrote:>>>>>It looks like someone moved the profiles directory to profiles.orig.>>>>>> -mjk>>>>>>>>>[root at rocks14 install]# ls -l>>>total 56>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom>>>drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:07>>>ftp.rocksclusters.org>>>drwxr-sr-x 3 root wheel 4096 Dec 10 20:38>>>ftp.rocksclusters.org.orig>>>-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi>>>drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist>>>drwxrwsr-x 3 root wheel 4096 Dec 10 20:38rocks-dist.orig>>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src>>>drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo>>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>>>>>>>>>>When I run this:>>>>>>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;>>>>rocks-dist --dist=cdrom cdrom>>>>>>>>on a server installed with ROCKS 3.0.0, I eventually get this:>>>>>>>>>>>>>Cleaning distribution>>>>>Resolving versions (RPMs)>>>>>Resolving versions (SRPMs)>>>>>Adding support for rebuild distribution from source>>>>>Creating files (symbolic links - fast)>>>>>Creating symlinks to kickstart files>>>>>Fixing Comps Database>>>>>Generating hdlist (rpm database)>>>>>Patching second stage loader (eKV, partioning, ...)>>>>> patching "rocks-ekv" into distribution ...>>>>> patching "rocks-piece-pipe" into distribution ...>>>>> patching "PyXML" into distribution ...>>>>> patching "expat" into distribution ...>>>>> patching "rocks-pylib" into distribution ...>>>>> patching "MySQL-python" into distribution ...>>>>> patching "rocks-kickstart" into distribution ...>>>>> patching "rocks-kickstart-profiles" into distribution ...>>>>> patching "rocks-kickstart-dtds" into distribution ...>>>>> building CRAM filesystem ...>>>>>Cleaning distribution>>>>>Resolving versions (RPMs)>>>>>Resolving versions (SRPMs)

Page 137: 2003 December

>>>>>Creating symlinks to kickstart files>>>>>Generating hdlist (rpm database)>>>>>Segregating RPMs (rocks, non-rocks)>>>>>sh: ./kickstart.cgi: No such file or directory>>>>>sh: ./kickstart.cgi: No such file or directory>>>>>Traceback (innermost last):>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>>>> app.run()>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>>>> eval('self.command_%s()' % (command))>>>>> File "<string>", line 0, in ?>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>>>> builder.build()>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>>>> (rocks, nonrocks) = self.segregateRPMS()>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in>>>>>segregateRPMS>>>>> for pkg in ks.getSection('packages'):>>>>>TypeError: loop over non-sequence>>>>>>>>>>>>Any ideas?>>>>>>>>-->>>>Vicky Rowley email: vrowley at ucsd.edu>>>>Biomedical Informatics Research Network work: (858) 536-5980>>>>University of California, San Diego fax: (858) 822-0828>>>>9500 Gilman Drive>>>>La Jolla, CA 92093-0715>>>>>>>>>>>>See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb>>>>>>>>>>>-->>Vicky Rowley email: vrowley at ucsd.edu>>Biomedical Informatics Research Network work: (858) 536-5980>>University of California, San Diego fax: (858) 822-0828>>9500 Gilman Drive>>La Jolla, CA 92093-0715>>>>>>See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb>>>>> > > >

-- Vicky Rowley email: vrowley at ucsd.eduBiomedical Informatics Research Network work: (858) 536-5980University of California, San Diego fax: (858) 822-08289500 Gilman DriveLa Jolla, CA 92093-0715

Page 138: 2003 December

See pictures from our trip to China athttp://www.sagacitech.com/Chinaweb

--__--__--

_______________________________________________npaci-rocks-discussion mailing listnpaci-rocks-discussion at sdsc.eduhttp://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

End of npaci-rocks-discussion Digest

DISCLAIMER:This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you.

From wyzhong78 at msn.com Thu Dec 11 07:27:39 2003From: wyzhong78 at msn.com (zhong wenyu)Date: Thu, 11 Dec 2003 23:27:39 +0800Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node?Message-ID: <[email protected]>

I have build a rocks cluster with four double Xeon computer to run namd.one frontend and the other three to be compute.with intel's hyper threading tecnology i have 16 cpus at all.now I have some troubles. Maybe someone can help me. I created bellow pbs script named mytask.#!/bin/csh#PBS -N NAMD#PBS -m be#PBS -l ncpus=8#PBS -l nodes=2#cd $PBS_O_WORKDIR/charmrun namd2 +p8 mytask.namd

i typed:qsub mytaskqrun N

then i use qstat -f N

the message feedback showed(i'm sorry i can't copy the orgin message,just the meaning)

host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1cpu used: 8

it's strange why 4 hosts and 8 cpu used?

Page 139: 2003 December

but when i saw ganlia, the cluster status. it show me only one node used (fore example ,compute-0-0).both the other two are idle.i want to know whether the job was doing by one or two node.so i creat a new task specify to compute-0-1,message feedback show no resource availabe.while the task ended,i checked the information, found that the cpu time per step is half of 4 cpus (1 nodes),but the whole time(include wall time) is equal. Does my namd job allocate to each node?please help me!thanks

_________________________________________________________________?????????????? MSN Messenger: http://messenger.msn.com/cn

From bruno at rocksclusters.org Thu Dec 11 07:55:17 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Thu, 11 Dec 2003 07:55:17 -0800Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platformIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

outstanding -- thanks for the patch!

i just committed the change to cvs. the fix will be reflected in the upcoming release (or immediately for anyone who has the rocks source tree checked out on their local frontend).

- gb

On Dec 10, 2003, at 10:43 PM, Vincent Fox wrote:

> Okay, here's the?context diff?as plain text. I test-applied it using > "patch -p0 < atlas.patch" and did a compile on my PII box > successfully. I can send it as attachment or submit to CVS or some > other way if you need:> ?> *** atlas.spec.in.orig? Thu Dec 11 06:27:13 2003> --- atlas.spec.in?????? Thu Dec 11 06:30:46 2003> ***************> *** 111,117 ****> --- 111,133 ----> ? y> ? " | make> + elif [ $CPUID -eq 4 ]> + then> + #> + # Pentium II> + #> + echo "0> + y> + y> + n> + y> + linux

Page 140: 2003 December

> + 0> + /usr/bin/g77> + -O> + y> + " | make> ? else> ? #>>> Greg Bruno <bruno at rocksclusters.org>wrote:> > Okay, came up my own quick hack:> >> > Edit atlas.spec.in, go to "other x86" section, remove> > 2 lines right above "linux", seems to make rpm now.> >> > A more formal patch would be put in a section for> > cpuid eq 4 with this correction I suppose.>> if you provide the patch, we'll include it in our next release.>> - gb>> Do you Yahoo!?> New Yahoo! Photos - easier uploading and sharing

From phil at sdsc.edu Thu Dec 11 08:00:06 2003From: phil at sdsc.edu (Philip Papadopoulos)Date: Thu, 11 Dec 2003 12:00:06 -0400Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node?Message-ID: <1920451470-1071158479-cardhu_blackberry.rim.net-21416-@engine05>

The important thing to understand is the pbs only gives an allocation of nodes (listed in the PBS_NODES environment variable) when the job is run. It is the user's responsibility to actually start the code on multiple nodes. This is the way pbs works on all platforms, not just rocks.

Pbs will start the submitted code (usually a script) on the first node listed in PBS_NODES. This environment variable is only available once the queued job is running. Your mytask script must explicitly start on the allocated nodes.

Pbs (actually maui) will pack jobs onto nodes by default, so allocating 8 cpu jobs to four nodes is normal, but changable.

-p

-----Original Message-----From: "zhong wenyu" <wyzhong78 at msn.com>Date: Thu, 11 Dec 2003 23:27:39 To:npaci-rocks-discussion at sdsc.eduSubject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node?

I have build a rocks cluster with four double Xeon computer to run namd.one frontend and the other three to be compute.with intel's hyper threading tecnology i have 16 cpus at all.now I have some troubles. Maybe someone can help me. I created bellow pbs script named mytask.#!/bin/csh#PBS -N NAMD

Page 141: 2003 December

#PBS -m be#PBS -l ncpus=8#PBS -l nodes=2#cd $PBS_O_WORKDIR/charmrun namd2 +p8 mytask.namd

i typed:qsub mytaskqrun N

then i use qstat -f N

the message feedback showed(i'm sorry i can't copy the orgin message,just the meaning)

host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1cpu used: 8

it's strange why 4 hosts and 8 cpu used? but when i saw ganlia, the cluster status. it show me only one node used (fore example ,compute-0-0).both the other two are idle.i want to know whether the job was doing by one or two node.so i creat a new task specify to compute-0-1,message feedback show no resource availabe.while the task ended,i checked the information, found that the cpu time per step is half of 4 cpus (1 nodes),but the whole time(include wall time) is equal. Does my namd job allocate to each node?please help me!thanks

_________________________________________________________________???????????????????????????? MSN Messenger: http://messenger.msn.com/cn

Sent via BlackBerry - a service from AT&T Wireless.

From jlkaiser at fnal.gov Thu Dec 11 08:28:08 2003From: jlkaiser at fnal.gov (Joe Kaiser)Date: Thu, 11 Dec 2003 10:28:08 -0600Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Hi,

I'm sorry, I thought I sent email to the list reporting how I did this.

You have not said what motherboard you are using or what the errorexactly is. The instructions below are for the X5DPA-GG and the errorisn't reported as an error, I just get prompted to insert my driver.

If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have tomake a change to the pcitable on the initrd.img. The current pcitableon the initrd.img does NOT have the proper deviceId for the e1000 forthis board. If you look in /etc/sysconfig/hwconf and search for the

Page 142: 2003 December

e1000, you will find this:

class: NETWORKbus: PCIdetached: 0device: ethdriver: e1000desc: "Unknown vendor|Generic e1000 device"vendorId: 8086deviceId: 1013subVendorId: 8086subDeviceId: 1213pciType: 1

The device ID is 1013. If you look in the pcitable that comes off ofthe initrd.img you will see that the highest the e1000 device id's go is1012. Just add in the proper line to the initrd.img in your /tftpbootdirectory and it should work. Instructions are below.

Here are the instructions:

This should be done on the frontend:

cd /tftpboot/X86PC/UNDI/pxelinux/cp initrd.img initrd.img.origcp initrd.img /tmpcd /tmpmv initrd.img initrd.gzgunzip initrd.gzmkdir /mnt/loopmount -o loop initrd /mnt/loopcd /mnt/loop/modules/vi pcitable

Search for the e1000 drivers and add the following line:

0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit EthernetController"

write the file

cd /tmpumount /mnt/loopgzip initrdmv initrd.gz initrd.imgmv initrd.img /tftpboot/X86PC/UNDI/pxelinux/

Then boot the node.

Hope this helps.

Thanks,

Joe

On Tue, 2003-12-09 at 15:59, Joe Landman wrote:> Folks:> > As indicated previously, I am wrestling with a Supermicro based

Page 143: 2003 December

> cluster. None of the RH distributions come with the correct E1000> driver, so a new kernel is needed (in the boot CD, and for> installation).> > The problem I am running into is that it isn't at all obvious/easy how> to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable> this thing to work. Following the examples in the documentation have> not met with success. Running "rocks-dist cdrom" with the new kernels> (2.4.23 works nicely on the nodes) in the force/RPMS directory generates> a bootable CD with the original 2.4.18BOOT kernel. > > What I (and I think others) need, is a simple/easy to follow method> that will generate a bootable CD with the correct linux kernel, and the> correct modules.> > Is this in process somewhere? What would be tremendously helpful is> if we can generate a binary module, and put that into the boot process> by placing it into the force/modules/binary directory (assuming one> exists) with the appropriate entry of a similar name in the> force/modules/meta directory as a simple XML document giving pci-ids,> description, name, etc.> > Anything close to this coming? Modules are killing future ROCKS> installs, the inability to easily inject a new module in there has> created a problem whereby ROCKS does not function (as the underlying RH> does not function). > > > -- ===================================================================Joe Kaiser - Systems Administrator

Fermi Lab CD/OSS-SCS Never laugh at live dragons.630-840-6444jlkaiser at fnal.gov ===================================================================

From jghobrial at uh.edu Thu Dec 11 08:41:42 2003From: jghobrial at uh.edu (Joseph)Date: Thu, 11 Dec 2003 10:41:42 -0600 (CST)Subject: [Rocks-Discuss]Re: Rocks Pythone Error with rocks.fileIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

On Thu, 11 Dec 2003, Terrence Martin wrote:

> I am having the exact same error that you reported to the list on my > cluster when I try to install rocks 3.0.0.> > X tries to start, fails, then just before the HPC roll is supposed to > start I get the python error about not being able to load the rocks.file.> > The thing is that my system is a dual Xeon supermicro not AMD, so it > must not be an AMD specific issue.

Page 144: 2003 December

> > Did you ever find a resolution to the problem?> > Thanks,> > Terrence>

Yes, I guess you should check your memory as Greg suggests, but my solution was to install the frontend on a different machine and then take the HD back to the original frontend. The only problem that I had was that the build box was a single processor setup so when I went back to the dual-AMD pvfs fails because it was built against a non-SMP kernel. I installed the SMP kernel and noticed this problem.

It seems the problem may be related to an SMP issue do to the fact we both have an SMP setup. I did not check the frontend's memory so this may still be a factor, but I have had no trouble with the box after the installation.

My initial problem was a booting problem on the frontend due to a cdrom issue. All my other attempts at installing failed with the error you mentioned, but as I posted early I tried 3 different AMD single processor boxes and they failed. The boxes are up all the time and stressed pretty hard so I don't believe it is a memory issue.

This is some very strange behaviour.

Thanks,Joseph

From shewa at inel.gov Thu Dec 11 10:02:59 2003From: shewa at inel.gov (Andrew Shewmaker)Date: Thu, 11 Dec 2003 11:02:59 -0700Subject: [Rocks-Discuss]ssh_known_hosts and gangliaMessage-ID: <[email protected]>

"Mason J. Katz" <mjk at sdsc.edu> wrote:

> We've also moved from this method to a single cluster-wide ssh key for > Rocks 3.1.

How does a single key work? I have successfully set up ssh hostbased authentication for some non-Rocks systems using

http://www.omega.telia.net/vici/openssh/

(Note that OpenSSH_3.7.1p2 requires one more setting in additionto those mentioned in the above url.

In <dir-of-ssh-conf-files>/ssh_config:EnableSSHKeysign yes)

But I thought it still requires that each host in the has a key...am I wrong? Do you do it differently?

Thanks,

Page 145: 2003 December

Andrew

-- Andrew Shewmaker, Associate EngineerPhone: 1-208-526-1415Idaho National Eng. and Environmental Lab.P.0. Box 1625, M.S. 3605Idaho Falls, Idaho 83415-3605

From tmartin at physics.ucsd.edu Thu Dec 11 11:13:16 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Thu, 11 Dec 2003 11:13:16 -0800Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Hi Joe,

Do you know if 2.3.2 can also benefit from the same small change?

Terrence

Joe Kaiser wrote:> Hi,> > I'm sorry, I thought I sent email to the list reporting how I did this. > > You have not said what motherboard you are using or what the error> exactly is. The instructions below are for the X5DPA-GG and the error> isn't reported as an error, I just get prompted to insert my driver. > > If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to> make a change to the pcitable on the initrd.img. The current pcitable> on the initrd.img does NOT have the proper deviceId for the e1000 for> this board. If you look in /etc/sysconfig/hwconf and search for the> e1000, you will find this:> > class: NETWORK> bus: PCI> detached: 0> device: eth> driver: e1000> desc: "Unknown vendor|Generic e1000 device"> vendorId: 8086> deviceId: 1013> subVendorId: 8086> subDeviceId: 1213> pciType: 1> > The device ID is 1013. If you look in the pcitable that comes off of> the initrd.img you will see that the highest the e1000 device id's go is> 1012. Just add in the proper line to the initrd.img in your /tftpboot> directory and it should work. Instructions are below.

Page 146: 2003 December

> > Here are the instructions:> > This should be done on the frontend:> > cd /tftpboot/X86PC/UNDI/pxelinux/> cp initrd.img initrd.img.orig> cp initrd.img /tmp> cd /tmp> mv initrd.img initrd.gz> gunzip initrd.gz> mkdir /mnt/loop> mount -o loop initrd /mnt/loop> cd /mnt/loop/modules/> vi pcitable> > Search for the e1000 drivers and add the following line:> > 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet> Controller"> > write the file> > cd /tmp> umount /mnt/loop> gzip initrd> mv initrd.gz initrd.img> mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/> > Then boot the node. > > Hope this helps. > > Thanks, > > Joe > > On Tue, 2003-12-09 at 15:59, Joe Landman wrote:> >>Folks:>>>> As indicated previously, I am wrestling with a Supermicro based>>cluster. None of the RH distributions come with the correct E1000>>driver, so a new kernel is needed (in the boot CD, and for>>installation).>>>> The problem I am running into is that it isn't at all obvious/easy how>>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable>>this thing to work. Following the examples in the documentation have>>not met with success. Running "rocks-dist cdrom" with the new kernels>>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates>>a bootable CD with the original 2.4.18BOOT kernel. >>>> What I (and I think others) need, is a simple/easy to follow method>>that will generate a bootable CD with the correct linux kernel, and the>>correct modules.>>>> Is this in process somewhere? What would be tremendously helpful is>>if we can generate a binary module, and put that into the boot process

Page 147: 2003 December

>>by placing it into the force/modules/binary directory (assuming one>>exists) with the appropriate entry of a similar name in the>>force/modules/meta directory as a simple XML document giving pci-ids,>>description, name, etc.>>>> Anything close to this coming? Modules are killing future ROCKS>>installs, the inability to easily inject a new module in there has>>created a problem whereby ROCKS does not function (as the underlying RH>>does not function). >>>>>>

From tmartin at physics.ucsd.edu Thu Dec 11 11:19:55 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Thu, 11 Dec 2003 11:19:55 -0800Subject: [Rocks-Discuss]Re: Rocks Pythone Error with rocks.fileIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

I am fairly certain it is not the memory even without memtest86. I have in my office the same Supermicro 613A-Xi (SB-613A-Xi-B) with a SUPER X5DPA-GG motherboard as the ones at the SDSC but it is from a different vendor and completely different ram from another manufacturer.

When I put rocks 3.0.0 into it I get the crash of the installer in the same spot, right after the system attempts to start Xwindows and fails (either it fails because it just fails to start X or if a mouse is not present) a python error comes up complaining that the rocks.file could not be found.

On the exact same system rocks 2.3.2 installs fine.

Terrence

Joseph wrote:> On Thu, 11 Dec 2003, Terrence Martin wrote:> > >>I am having the exact same error that you reported to the list on my >>cluster when I try to install rocks 3.0.0.>>>>X tries to start, fails, then just before the HPC roll is supposed to >>start I get the python error about not being able to load the rocks.file.>>>>The thing is that my system is a dual Xeon supermicro not AMD, so it >>must not be an AMD specific issue.>>>>Did you ever find a resolution to the problem?>>>>Thanks,>>>>Terrence>>

Page 148: 2003 December

> > > Yes, I guess you should check your memory as Greg suggests, but my > solution was to install the frontend on a different machine and then take > the HD back to the original frontend. The only problem that I had was that > the build box was a single processor setup so when I went back to the > dual-AMD pvfs fails because it was built against a non-SMP kernel. > I installed the SMP kernel and noticed this problem. > > It seems the problem may be related to an SMP issue do to the fact we both > have an SMP setup. I did not check the frontend's memory so this may still > be a factor, but I have had no trouble with the box after the installation.> > My initial problem was a booting problem on the frontend due to a cdrom > issue. All my other attempts at installing failed with the error you mentioned, but as I > posted early I tried 3 different AMD single processor boxes and they > failed. The boxes are up all the time and stressed pretty hard so I don't > believe it is a memory issue.> > This is some very strange behaviour.> > Thanks,> Joseph>

From landman at scalableinformatics.com Thu Dec 11 11:42:14 2003From: landman at scalableinformatics.com (Joe Landman)Date: Thu, 11 Dec 2003 14:42:14 -0500Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets

...In-Reply-To: <[email protected]>References: <[email protected]>

<[email protected]> <[email protected]>

Message-ID: <[email protected]>

Hi Terrence and Joe:

These are indeed X5DPA-GG. I am working on a device driver disk for3.0 ROCKS. If this works, it is a weak hack, but it might be fine. More later (testing it now as we speak)..

Joe

On Thu, 2003-12-11 at 14:13, Terrence Martin wrote:> Hi Joe,> > Do you know if 2.3.2 can also benefit from the same small change?> > Terrence> > Joe Kaiser wrote:> > Hi,

Page 149: 2003 December

> > > > I'm sorry, I thought I sent email to the list reporting how I did this. > > > > You have not said what motherboard you are using or what the error> > exactly is. The instructions below are for the X5DPA-GG and the error> > isn't reported as an error, I just get prompted to insert my driver. > > > > If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to> > make a change to the pcitable on the initrd.img. The current pcitable> > on the initrd.img does NOT have the proper deviceId for the e1000 for> > this board. If you look in /etc/sysconfig/hwconf and search for the> > e1000, you will find this:> > > > class: NETWORK> > bus: PCI> > detached: 0> > device: eth> > driver: e1000> > desc: "Unknown vendor|Generic e1000 device"> > vendorId: 8086> > deviceId: 1013> > subVendorId: 8086> > subDeviceId: 1213> > pciType: 1> > > > The device ID is 1013. If you look in the pcitable that comes off of> > the initrd.img you will see that the highest the e1000 device id's go is> > 1012. Just add in the proper line to the initrd.img in your /tftpboot> > directory and it should work. Instructions are below.> > > > Here are the instructions:> > > > This should be done on the frontend:> > > > cd /tftpboot/X86PC/UNDI/pxelinux/> > cp initrd.img initrd.img.orig> > cp initrd.img /tmp> > cd /tmp> > mv initrd.img initrd.gz> > gunzip initrd.gz> > mkdir /mnt/loop> > mount -o loop initrd /mnt/loop> > cd /mnt/loop/modules/> > vi pcitable> > > > Search for the e1000 drivers and add the following line:> > > > 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet> > Controller"> > > > write the file> > > > cd /tmp> > umount /mnt/loop> > gzip initrd> > mv initrd.gz initrd.img> > mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/> > > > Then boot the node.

Page 150: 2003 December

> > > > Hope this helps. > > > > Thanks, > > > > Joe > > > > On Tue, 2003-12-09 at 15:59, Joe Landman wrote:> > > >>Folks:> >>> >> As indicated previously, I am wrestling with a Supermicro based> >>cluster. None of the RH distributions come with the correct E1000> >>driver, so a new kernel is needed (in the boot CD, and for> >>installation).> >>> >> The problem I am running into is that it isn't at all obvious/easy how> >>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable> >>this thing to work. Following the examples in the documentation have> >>not met with success. Running "rocks-dist cdrom" with the new kernels> >>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates> >>a bootable CD with the original 2.4.18BOOT kernel. > >>> >> What I (and I think others) need, is a simple/easy to follow method> >>that will generate a bootable CD with the correct linux kernel, and the> >>correct modules.> >>> >> Is this in process somewhere? What would be tremendously helpful is> >>if we can generate a binary module, and put that into the boot process> >>by placing it into the force/modules/binary directory (assuming one> >>exists) with the appropriate entry of a similar name in the> >>force/modules/meta directory as a simple XML document giving pci-ids,> >>description, name, etc.> >>> >> Anything close to this coming? Modules are killing future ROCKS> >>installs, the inability to easily inject a new module in there has> >>created a problem whereby ROCKS does not function (as the underlying RH> >>does not function). > >>> >>> >> -- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

From jlkaiser at fnal.gov Thu Dec 11 11:33:03 2003From: jlkaiser at fnal.gov (Joe Kaiser)Date: Thu, 11 Dec 2003 13:33:03 -0600Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Page 151: 2003 December

I am not sure. Presumably, yes....

On Thu, 2003-12-11 at 13:13, Terrence Martin wrote:> Hi Joe,> > Do you know if 2.3.2 can also benefit from the same small change?> > Terrence> > Joe Kaiser wrote:> > Hi,> > > > I'm sorry, I thought I sent email to the list reporting how I did this. > > > > You have not said what motherboard you are using or what the error> > exactly is. The instructions below are for the X5DPA-GG and the error> > isn't reported as an error, I just get prompted to insert my driver. > > > > If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to> > make a change to the pcitable on the initrd.img. The current pcitable> > on the initrd.img does NOT have the proper deviceId for the e1000 for> > this board. If you look in /etc/sysconfig/hwconf and search for the> > e1000, you will find this:> > > > class: NETWORK> > bus: PCI> > detached: 0> > device: eth> > driver: e1000> > desc: "Unknown vendor|Generic e1000 device"> > vendorId: 8086> > deviceId: 1013> > subVendorId: 8086> > subDeviceId: 1213> > pciType: 1> > > > The device ID is 1013. If you look in the pcitable that comes off of> > the initrd.img you will see that the highest the e1000 device id's go is> > 1012. Just add in the proper line to the initrd.img in your /tftpboot> > directory and it should work. Instructions are below.> > > > Here are the instructions:> > > > This should be done on the frontend:> > > > cd /tftpboot/X86PC/UNDI/pxelinux/> > cp initrd.img initrd.img.orig> > cp initrd.img /tmp> > cd /tmp> > mv initrd.img initrd.gz> > gunzip initrd.gz> > mkdir /mnt/loop> > mount -o loop initrd /mnt/loop> > cd /mnt/loop/modules/> > vi pcitable> > > > Search for the e1000 drivers and add the following line:> >

Page 152: 2003 December

> > 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet> > Controller"> > > > write the file> > > > cd /tmp> > umount /mnt/loop> > gzip initrd> > mv initrd.gz initrd.img> > mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/> > > > Then boot the node. > > > > Hope this helps. > > > > Thanks, > > > > Joe > > > > On Tue, 2003-12-09 at 15:59, Joe Landman wrote:> > > >>Folks:> >>> >> As indicated previously, I am wrestling with a Supermicro based> >>cluster. None of the RH distributions come with the correct E1000> >>driver, so a new kernel is needed (in the boot CD, and for> >>installation).> >>> >> The problem I am running into is that it isn't at all obvious/easy how> >>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable> >>this thing to work. Following the examples in the documentation have> >>not met with success. Running "rocks-dist cdrom" with the new kernels> >>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates> >>a bootable CD with the original 2.4.18BOOT kernel. > >>> >> What I (and I think others) need, is a simple/easy to follow method> >>that will generate a bootable CD with the correct linux kernel, and the> >>correct modules.> >>> >> Is this in process somewhere? What would be tremendously helpful is> >>if we can generate a binary module, and put that into the boot process> >>by placing it into the force/modules/binary directory (assuming one> >>exists) with the appropriate entry of a similar name in the> >>force/modules/meta directory as a simple XML document giving pci-ids,> >>description, name, etc.> >>> >> Anything close to this coming? Modules are killing future ROCKS> >>installs, the inability to easily inject a new module in there has> >>created a problem whereby ROCKS does not function (as the underlying RH> >>does not function). > >>> >>> >> -- ===================================================================Joe Kaiser - Systems Administrator

Fermi Lab CD/OSS-SCS Never laugh at live dragons.

Page 153: 2003 December

630-840-6444jlkaiser at fnal.gov ===================================================================

From landman at scalableinformatics.com Thu Dec 11 11:51:51 2003From: landman at scalableinformatics.com (Joe Landman)Date: Thu, 11 Dec 2003 14:51:51 -0500Subject: [Rocks-Discuss]driver disk for e1000 for rocks 3.0.0Message-ID: <[email protected]>

Folks:

I have built a slightly modified RedHat 7.3 driver disk with theupdated 5.2.22 e1000 driver. I verified that this does indeed work onmy systems (during initial portion of ROCKS install, I can now insmode1000 in the shell window and see the ethernet... this is a big changefrom before). If you want the driver disk grab it fromhttp://scalableinformatics.com/downloads/newdrv.img . To use it whileinstalling a front end, type

frontend dd

at the boot prompt (not just frontend). I believe it should work forthe compute nodes as well (i will test it soon). Now it is time to workaround the rest of the Supermicro "features".-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

From dtwright at uiuc.edu Thu Dec 11 12:32:54 2003From: dtwright at uiuc.edu (Dan Wright)Date: Thu, 11 Dec 2003 14:32:54 -0600Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

NAMD2 needs some more information to be started on multiple nodes like that.You need to give it a nodelist, in particular, so it knows where to run itself. We run namd2 on several clusters here (UIUC chemistry department).

Below is a script used to exec namd2 with the right options, etc, on acluster. Below that is a script that automates the PBS job submission. Hope this helps!

- Dan Wright(dtwright at uiuc.edu)(http://www.scs.uiuc.edu/)(UNIX Systems Administrator, School of Chemical Sciences)(333-1728)

Page 154: 2003 December

-- namd2.csh --

#!/bin/csh# Script to run NAMD2 on the cluster automatically.# Courtesy of Jim Phillips.

setenv CONV_RSH sshsetenv TMPDIR /tmpsetenv BINDIR /home/NAMD

if ( $?PBS_JOBID ) then if ( $?PBS_NODEFILE ) then set nodes = `cat $PBS_NODEFILE` else set nodes = localhost endif set nodefile = $TMPDIR/namd2.nodelist.$PBS_JOBID echo group main >! $nodefile foreach node ( $nodes ) echo host $node >> $nodefile end $BINDIR/charmrun $BINDIR/namd2 +p$#nodes ++nodelist $nodefile $*else $BINDIR/charmrun $BINDIR/namd2 ++local $*endif

-------------

Here's an example script using this to start namd2 on 8 uniprocessor nodes;you'd just run it as "namd2-8p <jobfile>" to automatically do the PBS jobsubmission and everything.

-- namd2-8p --

#!/bin/bash# This script runs namd2 on 8 nodes.#

echoecho "Please remember to specify the FULL PATH to your namd2 job file."echo "If you haven't done that, please press ctrl-c now and re-run"echo "this command with the full path."echosleep 10

export SCRIPTFILE=/tmp/namd2-script.$USER.`date "+%s"`export NAMD_SCRIPT=/usr/local/bin/namd2.csh

NAMD_CMD="$NAMD_SCRIPT $* > $HOME/namd2.out.`date '+%d%b%Y-%H:%M:%S'` 2>&1"

cat >$SCRIPTFILE <<EOF#!/bin/bash#PBS -l nodes=8

EOFecho $NAMD_CMD >> $SCRIPTFILEecho "exit" >> $SCRIPTFILE/usr/apps/pbs/bin/qsub -V $SCRIPTFILE

Page 155: 2003 December

sleep 5

rm -f $SCRIPTFILE

--------------

zhong wenyu said:> I have build a rocks cluster with four double Xeon computer to run namd.one > frontend and the other three to be compute.with intel's hyper threading > tecnology i have 16 cpus at all.> now I have some troubles. Maybe someone can help me.> I created bellow pbs script named mytask.> #!/bin/csh> #PBS -N NAMD> #PBS -m be> #PBS -l ncpus=8> #PBS -l nodes=2> #> cd $PBS_O_WORKDIR> /charmrun namd2 +p8 mytask.namd> > i typed:> qsub mytask> qrun N> > then i use > qstat -f N> > the message feedback showed(i'm sorry i can't copy the orgin message,just > the meaning)> > host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1> cpu used: 8> > it's strange why 4 hosts and 8 cpu used? > but when i saw ganlia, the cluster status. it show me only one node used > (fore example ,compute-0-0).both the other two are idle.> i want to know whether the job was doing by one or two node.> so i creat a new task specify to compute-0-1,message feedback show no > resource availabe.> while the task ended,i checked the information, found that the cpu time per > step is half of 4 cpus (1 nodes),but the whole time(include wall time) is > equal. > Does my namd job allocate to each node?> please help me!> thanks> > _________________________________________________________________> ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn > - Dan Wright(dtwright at uiuc.edu)(http://www.uiuc.edu/~dtwright)

-] ------------------------------ [-] -------------------------------- [-``Weave a circle round him thrice, / And close your eyes with holy dread, For he on honeydew hath fed, / and drunk the milk of Paradise.'' Samuel Taylor Coleridge, Kubla Khan

Page 156: 2003 December

-------------- next part --------------A non-text attachment was scrubbed...Name: not availableType: application/pgp-signatureSize: 189 bytesDesc: not availableUrl : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031211/417e39b4/attachment-0001.bin

From mjk at sdsc.edu Thu Dec 11 13:16:45 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Thu, 11 Dec 2003 13:16:45 -0800Subject: [Rocks-Discuss]ssh_known_hosts and gangliaIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Download 3.1 (out very soon now) and poke around. Basically there is a single SSH host key, and all the nodes have a copy. This kills the "man in the middle" warning every time you reinstall.

-mjk

On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote:

> "Mason J. Katz" <mjk at sdsc.edu> wrote:>> > We've also moved from this method to a single cluster-wide ssh key > for> > Rocks 3.1.>> How does a single key work? I have successfully set up ssh host> based authentication for some non-Rocks systems using>> http://www.omega.telia.net/vici/openssh/>> (Note that OpenSSH_3.7.1p2 requires one more setting in addition> to those mentioned in the above url.>> In <dir-of-ssh-conf-files>/ssh_config:> EnableSSHKeysign yes)>> But I thought it still requires that each host in the has a key...> am I wrong? Do you do it differently?>> Thanks,>> Andrew>> -- > Andrew Shewmaker, Associate Engineer> Phone: 1-208-526-1415> Idaho National Eng. and Environmental Lab.> P.0. Box 1625, M.S. 3605> Idaho Falls, Idaho 83415-3605

From landman at scalableinformatics.com Thu Dec 11 13:36:44 2003

Page 157: 2003 December

From: landman at scalableinformatics.com (Joe Landman)Date: Thu, 11 Dec 2003 16:36:44 -0500Subject: [Rocks-Discuss]ssh_known_hosts and gangliaIn-Reply-To: <[email protected]>References: <[email protected]>

<[email protected]>Message-ID: <[email protected]>

Hi Mason:

Eta? I have a non-functional cluster I think I can make function with3.1. I would be happy to be a real world beta/gamma tester for it(immediately, eg. today). Please send me a URL. ...

Joe

On Thu, 2003-12-11 at 16:16, Mason J. Katz wrote:> Download 3.1 (out very soon now) and poke around. Basically there is a > single SSH host key, and all the nodes have a copy. This kills the > "man in the middle" warning every time you reinstall.> > -mjk> > On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote:> > > "Mason J. Katz" <mjk at sdsc.edu> wrote:> >> > > We've also moved from this method to a single cluster-wide ssh key > > for> > > Rocks 3.1.> >> > How does a single key work? I have successfully set up ssh host> > based authentication for some non-Rocks systems using> >> > http://www.omega.telia.net/vici/openssh/> >> > (Note that OpenSSH_3.7.1p2 requires one more setting in addition> > to those mentioned in the above url.> >> > In <dir-of-ssh-conf-files>/ssh_config:> > EnableSSHKeysign yes)> >> > But I thought it still requires that each host in the has a key...> > am I wrong? Do you do it differently?> >> > Thanks,> >> > Andrew> >> > -- > > Andrew Shewmaker, Associate Engineer> > Phone: 1-208-526-1415> > Idaho National Eng. and Environmental Lab.> > P.0. Box 1625, M.S. 3605> > Idaho Falls, Idaho 83415-3605-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.com

Page 158: 2003 December

web : http://scalableinformatics.comphone: +1 734 612 4615

From mjk at sdsc.edu Thu Dec 11 13:34:30 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Thu, 11 Dec 2003 13:34:30 -0800Subject: [Rocks-Discuss]ssh_known_hosts and gangliaIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

We're too close to send out more beta's right now, but if something bad happens before friday we'll reconsider. We are shooting for next week - but absolutely before the holidays. ho ho ho. We recognize that our delay on getting a current release out there is hurting new clusters, and just having the latest redhat kernel is going to fix most of these issues.

-mjk

On Dec 11, 2003, at 1:36 PM, Joe Landman wrote:

> Hi Mason:>> Eta? I have a non-functional cluster I think I can make function > with> 3.1. I would be happy to be a real world beta/gamma tester for it> (immediately, eg. today). Please send me a URL. ...>> Joe>> On Thu, 2003-12-11 at 16:16, Mason J. Katz wrote:>> Download 3.1 (out very soon now) and poke around. Basically there is >> a>> single SSH host key, and all the nodes have a copy. This kills the>> "man in the middle" warning every time you reinstall.>>>> -mjk>>>> On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote:>>>>> "Mason J. Katz" <mjk at sdsc.edu> wrote:>>>>>>> We've also moved from this method to a single cluster-wide ssh key>>> for>>>> Rocks 3.1.>>>>>> How does a single key work? I have successfully set up ssh host>>> based authentication for some non-Rocks systems using>>>>>> http://www.omega.telia.net/vici/openssh/>>>>>> (Note that OpenSSH_3.7.1p2 requires one more setting in addition>>> to those mentioned in the above url.

Page 159: 2003 December

>>>>>> In <dir-of-ssh-conf-files>/ssh_config:>>> EnableSSHKeysign yes)>>>>>> But I thought it still requires that each host in the has a key...>>> am I wrong? Do you do it differently?>>>>>> Thanks,>>>>>> Andrew>>>>>> -- >>> Andrew Shewmaker, Associate Engineer>>> Phone: 1-208-526-1415>>> Idaho National Eng. and Environmental Lab.>>> P.0. Box 1625, M.S. 3605>>> Idaho Falls, Idaho 83415-3605> -- > Joseph Landman, Ph.D> Scalable Informatics LLC,> email: landman at scalableinformatics.com> web : http://scalableinformatics.com> phone: +1 734 612 4615

From purikk at hotmail.com Thu Dec 11 15:06:17 2003From: purikk at hotmail.com (Purushotham Komaravolu)Date: Thu, 11 Dec 2003 18:06:17 -0500Subject: [Rocks-Discuss]Kernal of Rocks 3.0References: <[email protected]>Message-ID: <[email protected]>

Hi, I am a newbie to Rocks and have a few questions. I would appreciate helpwith those.1) what kernel does latest rocks use, if its not latest can I use latestkernal and how?2) is there any way to have more than 1 fronend nodes for failoverredundancy?3) did anybody install penguin compilers over the clusterThanksRegards,Puru

From bruno at rocksclusters.org Thu Dec 11 15:42:27 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Thu, 11 Dec 2003 15:42:27 -0800Subject: [Rocks-Discuss]Kernal of Rocks 3.0In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

> 1) what kernel does latest rocks use, if its not latest can I use > latest> kernal and how?

Page 160: 2003 December

our upcoming release (scheduled to release next week) has kernel version 2.4.21. additionally, the new release includes documentation on how to build your own kernel RPM from a kernel.org tarball.

> 2) is there any way to have more than 1 fronend nodes for failover> redundancy?

no, that has not yet been implemented.

> 3) did anybody install penguin compilers over the cluster

i apologize, but i'm not familiar with the penguin compiler. we do have experience with gnu compilers, intel compilers and the portland group compilers. additionally, some folks in the rocks community have also successfully deployed the lahey compiler.

- gb

From oconnor at ucsd.edu Thu Dec 11 14:29:46 2003From: oconnor at ucsd.edu (Edward O'Connor)Date: Thu, 11 Dec 2003 14:29:46 -0800Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends?In-Reply-To: <[email protected]> (Edward O'Connor's message of "Fri, 22 Aug 2003 15:39:05 -0700")References: <[email protected]>

<[email protected]>Message-ID: <[email protected]>

Hi everybody,

I'm trying to bring up some ia64 compute nodes in a cluster with an ia32frontend. Normally, `cd /home/install; rocks-dist mirror dist` only setsup the frontend to handle ia32 compute nodes. I tried to manhandle`rocks-dist mirror` into mirroring the ia64 stuff fromftp.rocksclusters.org by giving it the --arch=ia64 option, but thatdidn't work, so I went ahead and did the mirroring step by hand.

After having done so, `rocks-dist dist` still doesn't do the rightthing. So, adding --arch=ia64 to that command yields this error output:

,----| # rocks-dist --arch=ia64 dist| Cleaning distribution| Resolving versions (RPMs)| Resolving versions (SRPMs)| Adding support for rebuild distribution from source| Creating files (symbolic links - fast)| Creating symlinks to kickstart files| Fixing Comps Database| error - comps file is missing, skipping this step| Generating hdlist (rpm database)| error - could not find rpm anaconda-runtime| error - could not find genhdlist| Patching second stage loader (eKV, partioning, ...)| error - could not find second stage, skipping this step`----

Page 161: 2003 December

So my question is, what do I need to do to the ia32 frontend to enableit to kickstart an ia64 compute node? Thanks.

Ted

-- Edward O'Connoroconnor at ucsd.edu

From gotero at linuxprophet.com Thu Dec 11 21:14:33 2003From: gotero at linuxprophet.com (Glen Otero)Date: Thu, 11 Dec 2003 21:14:33 -0800Subject: Fwd: [Rocks-Discuss]RE: Have anyone successfully build a set of grid compute nodes using Rocks?Message-ID: <[email protected]>

>>> We put two Itanium clusters and an x86 cluster together on a grid at > SC2003 using Rocks 3.1 beta and the Grid Roll. Simple CA is installed > on the cluster frontends for you, so all one has to do is create and > exchange certificates and update the grid-mapfiles. This grid was a > joint collaboration between SDSC, Promicro Systems and Callident.>> On Dec 11, 2003, at 12:08 AM, Nai Hong Hwa Francis wrote:>>>>>>>>> Hi,>>>> Have anyone successfully build a set of grid compute nodes using Rocks>> 3?>> Anyone care to share?>>>>>> Nai Hong Hwa Francis>> Institute of Molecular and Cell Biology (A*STAR)>> 30 Medical Drive>> Singapore 117609.>> DID: (65) 6874-6196>>>> -----Original Message----->> From: npaci-rocks-discussion-request at sdsc.edu>> [mailto:npaci-rocks-discussion-request at sdsc.edu]>> Sent: Thursday, December 11, 2003 11:54 AM>> To: npaci-rocks-discussion at sdsc.edu>> Subject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs>>>> Send npaci-rocks-discussion mailing list submissions to>> npaci-rocks-discussion at sdsc.edu>>>> To subscribe or unsubscribe via the World Wide Web, visit>> >> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion>> or, via email, send a message with subject or body 'help' to

Page 162: 2003 December

>> npaci-rocks-discussion-request at sdsc.edu>>>> You can reach the person managing the list at>> npaci-rocks-discussion-admin at sdsc.edu>>>> When replying, please edit your Subject line so it is more specific>> than "Re: Contents of npaci-rocks-discussion digest...">>>>>> Today's Topics:>>>> 1. RE: Do you have a list of the various models of Gigabit Ethernet>> Interfaces compatible to Rocks 3? (Nai Hong Hwa Francis)>> 2. Rocks 3.0.0 (Terrence Martin)>> 3. Re: "TypeError: loop over non-sequence" when trying>> to build CD distro (V. Rowley)>>>> --__--__-->>>> Message: 1>> Date: Thu, 11 Dec 2003 09:45:18 +0800>> From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg>>> To: <npaci-rocks-discussion at sdsc.edu>>> Subject: [Rocks-Discuss]RE: Do you have a list of the various models >> of>> Gigabit Ethernet Interfaces compatible to Rocks 3?>>>>>>>> Hi All,>>>> Do you have a list of the various gigabit Ethernet interfaces that are>> compatible to Rocks 3?>>>> I am changing my nodes connectivity from 10/100 to 1000.>>>> Have anyone done that and how are the differences in performance or>> turnaround time?>>>>>>>> Thanks and Regards>>>> Nai Hong Hwa Francis>> Institute of Molecular and Cell Biology (A*STAR)>> 30 Medical Drive>> Singapore 117609.>> DID: (65) 6874-6196>>>> -----Original Message----->> From: npaci-rocks-discussion-request at sdsc.edu>> [mailto:npaci-rocks-discussion-request at sdsc.edu]=20>> Sent: Thursday, December 11, 2003 9:25 AM>> To: npaci-rocks-discussion at sdsc.edu>> Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs>>>> Send npaci-rocks-discussion mailing list submissions to>> npaci-rocks-discussion at sdsc.edu>>

Page 163: 2003 December

>> To subscribe or unsubscribe via the World Wide Web, visit>> =09>> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion>> or, via email, send a message with subject or body 'help' to>> npaci-rocks-discussion-request at sdsc.edu>>>> You can reach the person managing the list at>> npaci-rocks-discussion-admin at sdsc.edu>>>> When replying, please edit your Subject line so it is more specific>> than "Re: Contents of npaci-rocks-discussion digest...">>>>>> Today's Topics:>>>> 1. Non-homogenous legacy hardware (Chris Dwan (CCGB))>> 2. Error during Make when building a new install floppy (Terrence>> Martin)>> 3. Re: Error during Make when building a new install floppy (Tim>> Carlson)>> 4. Re: Non-homogenous legacy hardware (Tim Carlson)>> 5. ssh_known_hosts and ganglia (Jag)>> 6. Re: ssh_known_hosts and ganglia (Mason J. Katz)>> 7. "TypeError: loop over non-sequence" when trying to build CD>> distro (V. Rowley)>> 8. Re: one node short in "labels" (Greg Bruno)>> 9. Re: "TypeError: loop over non-sequence" when trying to build CD>> distro (Mason J. Katz)>> 10. Re: "TypeError: loop over non-sequence" when trying>> to build CD distro (V. Rowley)>> 11. Re: "TypeError: loop over non-sequence" when trying to>> build CD distro (Tim Carlson)>>>> -- __--__-- >> Message: 1>> Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)>> From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>>> To: npaci-rocks-discussion at sdsc.edu>> Subject: [Rocks-Discuss]Non-homogenous legacy hardware>>>>>> I am integrating legacy systems into a ROCKS cluster, and have hit a>> snag with the auto-partition configuration: The new (old) systems >> have>> SCSI disks, while old (new) ones contain IDE. This is a non-issue so>> long as the initial install does its default partitioning. However, I>> have a "replace-auto-partition.xml" file which is unworkable for the>> SCSI>> based systems since it makes specific reference to "hda" rather than>> "sda.">>>> I would like to have a site-nodes/replace-auto-partition.xml file >> with a>> conditional such that "hda" or "sda" is used, based on the name of the>> node (or some other criterion).>>>> Is this possible?>>>> Thanks, in advance. If this is out there on the mailing list

Page 164: 2003 December

>> archives,>> a>> pointer would be greatly appreciated.>>>> -Chris Dwan>> The University of Minnesota>>>> -- __--__-- >> Message: 2>> Date: Wed, 10 Dec 2003 12:09:11 -0800>> From: Terrence Martin <tmartin at physics.ucsd.edu>>> To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>>> Subject: [Rocks-Discuss]Error during Make when building a new install>> floppy>>>> I get the following error when I try to rebuild a boot floppy for >> rocks.>>>> This is with the default CVS checkout with an update today according>> to=20>> the rocks userguide. I have not actually attempted to make any >> changes.>>>> make[3]: Leaving directory=20>> `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'>> make[2]: Leaving directory=20>> `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'>> strip -o loader anaconda-7.3/loader/loader>> strip: anaconda-7.3/loader/loader: No such file or directory>> make[1]: *** [loader] Error 1>> make[1]: Leaving directory>> `/home/install/rocks/src/rocks/boot/7.3/loader'>> make: *** [loader] Error 2>>>> Of course I could avoid all of this together and just put my binary=20>> module into the appropriate location in the boot image.>>>> Would it be correct to modify the following image file with my>> changes=20>> and then write it to a floppy via dd?>>>> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/ >> 7.3>> /en/os/i386/images/bootnet.img>>>> Basically I am injecting an updated e1000 driver with changes to=20>> pcitable to support the address of my gigabit cards.>>>> Terrence>>>>>> -- __--__-->> Message: 3>> Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)>> From: Tim Carlson <tim.carlson at pnl.gov>>> Subject: Re: [Rocks-Discuss]Error during Make when building a new>> install floppy>> To: Terrence Martin <tmartin at physics.ucsd.edu>>> Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>

Page 165: 2003 December

>> Reply-to: Tim Carlson <tim.carlson at pnl.gov>>>>> On Wed, 10 Dec 2003, Terrence Martin wrote:>>>>> I get the following error when I try to rebuild a boot floppy for>> rocks.>>>>>>> You can't make a boot floppy with Rocks 3.0. That isn't supported. Or >> at>> least it wasn't the last time I checked>>>>> Of course I could avoid all of this together and just put my binary>>> module into the appropriate location in the boot image.>>>>>> Would it be correct to modify the following image file with my >>> changes>>> and then write it to a floppy via dd?>>>>>>>> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/ >> 7.3>> /en/os/i386/images/bootnet.img>>>>>> Basically I am injecting an updated e1000 driver with changes to>>> pcitable to support the address of my gigabit cards.>>>> Modifiying the bootnet.img is about 1/3 of what you need to do if you >> go>> down that path. You also need to work on netstg1.img and you'll need >> to>> update the drive in the kernel rpm that gets installed on the box. >> None>> of>> this is trivial.>>>> If it were me, I would go down the same path I took for updating the>> AIC79XX driver>>>> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/ >> 003>> 533.html>>>> Tim>>>> Tim Carlson>> Voice: (509) 376 3423>> Email: Tim.Carlson at pnl.gov>> EMSL UNIX System Support>>>>>> -- __--__-->> Message: 4>> Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)>> From: Tim Carlson <tim.carlson at pnl.gov>>> Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware>> To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>>> Cc: npaci-rocks-discussion at sdsc.edu>> Reply-to: Tim Carlson <tim.carlson at pnl.gov>

Page 166: 2003 December

>>>> On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:>>>>>>>> I am integrating legacy systems into a ROCKS cluster, and have hit a>>> snag with the auto-partition configuration: The new (old) systems>> have>>> SCSI disks, while old (new) ones contain IDE. This is a non-issue so>>> long as the initial install does its default partitioning. However, >>> I>>> have a "replace-auto-partition.xml" file which is unworkable for the>> SCSI>>> based systems since it makes specific reference to "hda" rather than>>> "sda.">>>> If you have just a single drive, then you should be able to skip the>> "--ondisk" bits of your "part" command>>>> Otherwise, you would have first to do something ugly like the >> following:>>>> http://penguin.epfl.ch/slides/kickstart/ks.cfg>>>> You could probably (maybe) wrap most of that in an>> <eval sh=3D"bash">>> </eval>>>>> block in the <main> block.>>>> Just guessing.. haven't tried this.>>>> Tim>>>> Tim Carlson>> Voice: (509) 376 3423>> Email: Tim.Carlson at pnl.gov>> EMSL UNIX System Support>>>>>> -- __--__-->> Message: 5>> From: Jag <agrajag at dragaera.net>>> To: npaci-rocks-discussion at sdsc.edu>> Date: Wed, 10 Dec 2003 13:21:07 -0500>> Subject: [Rocks-Discuss]ssh_known_hosts and ganglia>>>> I noticed a previous post on this list>> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/ >> 001934>> .html) indicating that Rocks distributes ssh keys for all the nodes >> over>> ganglia. Can anyone enlighten me as to how this is done?>>>> I looked through the ganglia docs and didn't see anything indicating >> how>> to do this, so I'm assuming Rocks made some changes. Unfortunately >> the>> rocks iso images don't seem to contain srpms, so I'm now coming >> here.=20

Page 167: 2003 December

>> What did Rocks do to ganglia to make the distribution of ssh keys >> work?>>>> Also, does anyone know where Rocks SRPMs can be found? I've done >> quite>> a bit of searching, but haven't found them anywhere.>>>>>> -- __--__-->> Message: 6>> Cc: npaci-rocks-discussion at sdsc.edu>> From: "Mason J. Katz" <mjk at sdsc.edu>>> Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia>> Date: Wed, 10 Dec 2003 14:39:15 -0800>> To: Jag <agrajag at dragaera.net>>>>> Most of the SRPMS are on our FTP site, but we've screwed this up =20>> before. The SRPMS are entirely Rocks specific so they are of little >> =20>> value outside of Rocks. You can also checkout our CVS tree =20>> (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We>> =20>> have a ganglia-python package we created to allow us to write our own>> =20>> metrics at a high level than the provide gmetric application. We've >> =20>> also moved from this method to a single cluster-wide ssh key for Rocks>> =20>> 3.1.>>>> -mjk>>>> On Dec 10, 2003, at 10:21 AM, Jag wrote:>>>>> I noticed a previous post on this list>>> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20>>> 001934.html) indicating that Rocks distributes ssh keys for all the>> =20>>> nodes over>>> ganglia. Can anyone enlighten me as to how this is done?>>>>>> I looked through the ganglia docs and didn't see anything indicating>> =20>>> how>>> to do this, so I'm assuming Rocks made some changes. Unfortunately>> the>>> rocks iso images don't seem to contain srpms, so I'm now coming here.>>> What did Rocks do to ganglia to make the distribution of ssh keys>> work?>>>>>> Also, does anyone know where Rocks SRPMs can be found? I've done>> quite>>> a bit of searching, but haven't found them anywhere.>>>>>> -- __--__-->> Message: 7>> Date: Wed, 10 Dec 2003 14:43:49 -0800>> From: "V. Rowley" <vrowley at ucsd.edu>

Page 168: 2003 December

>> To: npaci-rocks-discussion at sdsc.edu>> Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when >> trying>> to build CD distro>>>> When I run this:>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >> rocks-dist>>>> --dist=3Dcdrom cdrom>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>> Cleaning distribution>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Adding support for rebuild distribution from source>>> Creating files (symbolic links - fast)>>> Creating symlinks to kickstart files>>> Fixing Comps Database>>> Generating hdlist (rpm database)>>> Patching second stage loader (eKV, partioning, ...)>>> patching "rocks-ekv" into distribution ...>>> patching "rocks-piece-pipe" into distribution ...>>> patching "PyXML" into distribution ...>>> patching "expat" into distribution ...>>> patching "rocks-pylib" into distribution ...>>> patching "MySQL-python" into distribution ...>>> patching "rocks-kickstart" into distribution ...>>> patching "rocks-kickstart-profiles" into distribution ...>>> patching "rocks-kickstart-dtds" into distribution ...>>> building CRAM filesystem ...>>> Cleaning distribution>>> Resolving versions (RPMs)>>> Resolving versions (SRPMs)>>> Creating symlinks to kickstart files>>> Generating hdlist (rpm database)>>> Segregating RPMs (rocks, non-rocks)>>> sh: ./kickstart.cgi: No such file or directory>>> sh: ./kickstart.cgi: No such file or directory>>> Traceback (innermost last):>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>> app.run()>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>> eval('self.command_%s()' % (command))>>> File "<string>", line 0, in ?>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>> builder.build()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>> (rocks, nonrocks) =3D self.segregateRPMS()>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in>> segregateRPMS>>> for pkg in ks.getSection('packages'):>>> TypeError: loop over non-sequence>>>> Any ideas?>>>> --=20

Page 169: 2003 December

>> Vicky Rowley email: vrowley at ucsd.edu>> Biomedical Informatics Research Network work: (858) 536-5980>> University of California, San Diego fax: (858) 822-0828>> 9500 Gilman Drive>> La Jolla, CA 92093-0715>>>>>> See pictures from our trip to China at>> http://www.sagacitech.com/Chinaweb>>>>>> -- __--__-->> Message: 8>> Cc: rocks <npaci-rocks-discussion at sdsc.edu>>> From: Greg Bruno <bruno at rocksclusters.org>>> Subject: Re: [Rocks-Discuss]one node short in "labels">> Date: Wed, 10 Dec 2003 15:12:49 -0800>> To: Vincent Fox <vincent_b_fox at yahoo.com>>>>>> So I go to the "labels" selection on the web page to print out =>> the=3D20>>> pretty labels. What a nice idea by the way!>>> =3DA0>>> EXCEPT....it's one node short! I go up to 0-13 and this stops at=3D20>>> 0-12.=3DA0 Any ideas where I should check to fix this?>>>> yeah, we found this corner case -- it'll be fixed in the next release.>>>> thanks for bug report.>>>> - gb>>>>>> -- __--__-->> Message: 9>> Cc: npaci-rocks-discussion at sdsc.edu>> From: "Mason J. Katz" <mjk at sdsc.edu>>> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when>> trying to build CD distro>> Date: Wed, 10 Dec 2003 15:16:27 -0800>> To: "V. Rowley" <vrowley at ucsd.edu>>>>> It looks like someone moved the profiles directory to profiles.orig.>>>> -mjk>>>>>> [root at rocks14 install]# ls -l>> total 56>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20>> ftp.rocksclusters.org>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20>> ftp.rocksclusters.org.orig>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi>> drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38

Page 170: 2003 December

>> rocks-dist.orig>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>>>>> When I run this:>>>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20>>> rocks-dist --dist=3Dcdrom cdrom>>>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>>>> Cleaning distribution>>>> Resolving versions (RPMs)>>>> Resolving versions (SRPMs)>>>> Adding support for rebuild distribution from source>>>> Creating files (symbolic links - fast)>>>> Creating symlinks to kickstart files>>>> Fixing Comps Database>>>> Generating hdlist (rpm database)>>>> Patching second stage loader (eKV, partioning, ...)>>>> patching "rocks-ekv" into distribution ...>>>> patching "rocks-piece-pipe" into distribution ...>>>> patching "PyXML" into distribution ...>>>> patching "expat" into distribution ...>>>> patching "rocks-pylib" into distribution ...>>>> patching "MySQL-python" into distribution ...>>>> patching "rocks-kickstart" into distribution ...>>>> patching "rocks-kickstart-profiles" into distribution ...>>>> patching "rocks-kickstart-dtds" into distribution ...>>>> building CRAM filesystem ...>>>> Cleaning distribution>>>> Resolving versions (RPMs)>>>> Resolving versions (SRPMs)>>>> Creating symlinks to kickstart files>>>> Generating hdlist (rpm database)>>>> Segregating RPMs (rocks, non-rocks)>>>> sh: ./kickstart.cgi: No such file or directory>>>> sh: ./kickstart.cgi: No such file or directory>>>> Traceback (innermost last):>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>>> app.run()>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>>> eval('self.command_%s()' % (command))>>>> File "<string>", line 0, in ?>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>>> builder.build()>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>>> (rocks, nonrocks) =3D self.segregateRPMS()>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20>>>> segregateRPMS>>>> for pkg in ks.getSection('packages'):>>>> TypeError: loop over non-sequence>>>>>> Any ideas?>>>>>> --=20>>> Vicky Rowley email: vrowley at ucsd.edu>>> Biomedical Informatics Research Network work: (858) 536-5980

Page 171: 2003 December

>>> University of California, San Diego fax: (858) 822-0828>>> 9500 Gilman Drive>>> La Jolla, CA 92093-0715>>>>>>>>> See pictures from our trip to China at=20>>> http://www.sagacitech.com/Chinaweb>>>>>> -- __--__-->> Message: 10>> Date: Wed, 10 Dec 2003 16:50:16 -0800>> From: "V. Rowley" <vrowley at ucsd.edu>>> To: "Mason J. Katz" <mjk at sdsc.edu>>> CC: npaci-rocks-discussion at sdsc.edu>> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when>> trying>> to build CD distro>>>> Yep, I did that, but only *AFTER* getting the error. [Thought it >> was=20>> generated by the rocks-dist sequence, but apparently not.] Go >> ahead.=20>> Move it back. Same difference.>>>> Vicky>>>> Mason J. Katz wrote:>>> It looks like someone moved the profiles directory to profiles.orig.>>> =20>>> -mjk>>> =20>>> =20>>> [root at rocks14 install]# ls -l>>> total 56>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom>>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20>>> ftp.rocksclusters.org>>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20>>> ftp.rocksclusters.org.orig>>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi>>> drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist>>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38>> rocks-dist.orig>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src>>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>>> =20>>>> When I run this:>>>>>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20>>>> rocks-dist --dist=3Dcdrom cdrom>>>>>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>>>>>> Cleaning distribution>>>>> Resolving versions (RPMs)

Page 172: 2003 December

>>>>> Resolving versions (SRPMs)>>>>> Adding support for rebuild distribution from source>>>>> Creating files (symbolic links - fast)>>>>> Creating symlinks to kickstart files>>>>> Fixing Comps Database>>>>> Generating hdlist (rpm database)>>>>> Patching second stage loader (eKV, partioning, ...)>>>>> patching "rocks-ekv" into distribution ...>>>>> patching "rocks-piece-pipe" into distribution ...>>>>> patching "PyXML" into distribution ...>>>>> patching "expat" into distribution ...>>>>> patching "rocks-pylib" into distribution ...>>>>> patching "MySQL-python" into distribution ...>>>>> patching "rocks-kickstart" into distribution ...>>>>> patching "rocks-kickstart-profiles" into distribution ...>>>>> patching "rocks-kickstart-dtds" into distribution ...>>>>> building CRAM filesystem ...>>>>> Cleaning distribution>>>>> Resolving versions (RPMs)>>>>> Resolving versions (SRPMs)>>>>> Creating symlinks to kickstart files>>>>> Generating hdlist (rpm database)>>>>> Segregating RPMs (rocks, non-rocks)>>>>> sh: ./kickstart.cgi: No such file or directory>>>>> sh: ./kickstart.cgi: No such file or directory>>>>> Traceback (innermost last):>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>>>> app.run()>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>>>> eval('self.command_%s()' % (command))>>>>> File "<string>", line 0, in ?>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>>>> builder.build()>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>>>> (rocks, nonrocks) =3D self.segregateRPMS()>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20>>>>> segregateRPMS>>>>> for pkg in ks.getSection('packages'):>>>>> TypeError: loop over non-sequence>>>>>>>>>>>> Any ideas?>>>>>>>> --=20>>>> Vicky Rowley email: vrowley at ucsd.edu>>>> Biomedical Informatics Research Network work: (858) 536-5980>>>> University of California, San Diego fax: (858) 822-0828>>>> 9500 Gilman Drive>>>> La Jolla, CA 92093-0715>>>>>>>>>>>> See pictures from our trip to China at>> http://www.sagacitech.com/Chinaweb>>> =20>>> =20>>> =20>>>> --=20>> Vicky Rowley email: vrowley at ucsd.edu

Page 173: 2003 December

>> Biomedical Informatics Research Network work: (858) 536-5980>> University of California, San Diego fax: (858) 822-0828>> 9500 Gilman Drive>> La Jolla, CA 92093-0715>>>>>> See pictures from our trip to China at>> http://www.sagacitech.com/Chinaweb>>>>>> -- __--__-->> Message: 11>> Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)>> From: Tim Carlson <tim.carlson at pnl.gov>>> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when>> trying to>> build CD distro>> To: "V. Rowley" <vrowley at ucsd.edu>>> Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu>> Reply-to: Tim Carlson <tim.carlson at pnl.gov>>>>> On Wed, 10 Dec 2003, V. Rowley wrote:>>>> Did you remove python by chance? kickstart.cgi calls python directly >> in>> /usr/bin/python while rocks-dist does an "env python">>>> Tim>>>>> Yep, I did that, but only *AFTER* getting the error. [Thought it was>>> generated by the rocks-dist sequence, but apparently not.] Go ahead.>>> Move it back. Same difference.>>>>>> Vicky>>>>>> Mason J. Katz wrote:>>>> It looks like someone moved the profiles directory to profiles.orig.>>>>>>>> -mjk>>>>>>>>>>>> [root at rocks14 install]# ls -l>>>> total 56>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom>>>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07>>>> ftp.rocksclusters.org>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38>>>> ftp.rocksclusters.org.orig>>>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40>> kickstart.cgi>>>> drwxr-xr-x 3 root root 4096 Dec 10 20:38>> profiles.orig>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist>>>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38>> rocks-dist.orig>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src>>>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo>>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

Page 174: 2003 December

>>>>>>>>> When I run this:>>>>>>>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;>>>>> rocks-dist --dist=3Dcdrom cdrom>>>>>>>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>>>>>>>> Cleaning distribution>>>>>> Resolving versions (RPMs)>>>>>> Resolving versions (SRPMs)>>>>>> Adding support for rebuild distribution from source>>>>>> Creating files (symbolic links - fast)>>>>>> Creating symlinks to kickstart files>>>>>> Fixing Comps Database>>>>>> Generating hdlist (rpm database)>>>>>> Patching second stage loader (eKV, partioning, ...)>>>>>> patching "rocks-ekv" into distribution ...>>>>>> patching "rocks-piece-pipe" into distribution ...>>>>>> patching "PyXML" into distribution ...>>>>>> patching "expat" into distribution ...>>>>>> patching "rocks-pylib" into distribution ...>>>>>> patching "MySQL-python" into distribution ...>>>>>> patching "rocks-kickstart" into distribution ...>>>>>> patching "rocks-kickstart-profiles" into distribution ...>>>>>> patching "rocks-kickstart-dtds" into distribution ...>>>>>> building CRAM filesystem ...>>>>>> Cleaning distribution>>>>>> Resolving versions (RPMs)>>>>>> Resolving versions (SRPMs)>>>>>> Creating symlinks to kickstart files>>>>>> Generating hdlist (rpm database)>>>>>> Segregating RPMs (rocks, non-rocks)>>>>>> sh: ./kickstart.cgi: No such file or directory>>>>>> sh: ./kickstart.cgi: No such file or directory>>>>>> Traceback (innermost last):>>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>>>>> app.run()>>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>>>>> eval('self.command_%s()' % (command))>>>>>> File "<string>", line 0, in ?>>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>>>>> builder.build()>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>>>>> (rocks, nonrocks) =3D self.segregateRPMS()>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in>>>>>> segregateRPMS>>>>>> for pkg in ks.getSection('packages'):>>>>>> TypeError: loop over non-sequence>>>>>>>>>>>>>>> Any ideas?>>>>>>>>>> -->>>>> Vicky Rowley email: vrowley at ucsd.edu>>>>> Biomedical Informatics Research Network work: (858) 536-5980>>>>> University of California, San Diego fax: (858) 822-0828>>>>> 9500 Gilman Drive>>>>> La Jolla, CA 92093-0715

Page 175: 2003 December

>>>>>>>>>>>>>>> See pictures from our trip to China at>> http://www.sagacitech.com/Chinaweb>>>>>>>>>>>>>>>>>> -->>> Vicky Rowley email: vrowley at ucsd.edu>>> Biomedical Informatics Research Network work: (858) 536-5980>>> University of California, San Diego fax: (858) 822-0828>>> 9500 Gilman Drive>>> La Jolla, CA 92093-0715>>>>>>>>> See pictures from our trip to China at>> http://www.sagacitech.com/Chinaweb>>>>>>>>>>>>>>>> -- __--__-->> _______________________________________________>> npaci-rocks-discussion mailing list>> npaci-rocks-discussion at sdsc.edu>> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion>>>>>> End of npaci-rocks-discussion Digest>>>>>> DISCLAIMER:>> This email is confidential and may be privileged. If you are not the =>> intended recipient, please delete it and notify us immediately. >> Please =>> do not copy or use it for any purpose, or disclose its contents to >> any =>> other person as it may be an offence under the Official Secrets Act. =>> Thank you.>>>> --__--__-->>>> Message: 2>> Date: Wed, 10 Dec 2003 18:03:41 -0800>> From: Terrence Martin <tmartin at physics.ucsd.edu>>> To: npaci-rocks-discussion at sdsc.edu>> Subject: [Rocks-Discuss]Rocks 3.0.0>>>> I am having a problem on install of rocks 3.0.0 on my new cluster.>>>> The python error occurs right after anaconda starts and just before >> the>> install asks for the roll CDROM.>>>> The error refers to an inability to find or load rocks.file. The error>> is associated I think with the window that pops up and asks you in put

Page 176: 2003 December

>> the roll CDROM in.>>>> The process I followed to get to this point is>>>> Put the Rocks 3.0.0 CDROM into the CDROM drive>> Boot the system>> At the prompt type frontend>> Wait till anaconda starts>> Error referring to unable to load rocks.file.>>>> I have successfully installed rocks on a smaller cluster but that has>> different hardware. I used the same CDROM for both installs.>>>> Any thoughts?>>>> Terrence>>>>>>>> --__--__-->>>> Message: 3>> Date: Wed, 10 Dec 2003 19:52:49 -0800>> From: "V. Rowley" <vrowley at ucsd.edu>>> To: npaci-rocks-discussion at sdsc.edu>> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when>> trying>> to build CD distro>>>> Looks like python is okay:>>>>> [root at rocks14 birn-oracle1]# which python>>> /usr/bin/python>>> [root at rocks14 birn-oracle1]# python --help>>> Unknown option: -->>> usage: python [option] ... [-c cmd | file | -] [arg] ...>>> Options and arguments (and corresponding environment variables):>>> -d : debug output from parser (also PYTHONDEBUG=x)>>> -i : inspect interactively after running script, (also>> PYTHONINSPECT=x)>>> and force prompts, even if stdin does not appear to be a>> terminal>>> -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)>>> -OO : remove doc-strings in addition to the -O optimizations>>> -S : don't imply 'import site' on initialization>>> -t : issue warnings about inconsistent tab usage (-tt: issue>> errors)>>> -u : unbuffered binary stdout and stderr (also >>> PYTHONUNBUFFERED=x)>>> -v : verbose (trace import statements) (also PYTHONVERBOSE=x)>>> -x : skip first line of source, allowing use of non-Unix forms of>> #!cmd>>> -X : disable class based built-in exceptions>>> -c cmd : program passed in as string (terminates option list)>>> file : program read from script file>>> - : program read from stdin (default; interactive mode if a tty)>>> arg ...: arguments passed to program in sys.argv[1:]>>> Other environment variables:>>> PYTHONSTARTUP: file executed on interactive startup (no default)

Page 177: 2003 December

>>> PYTHONPATH : ':'-separated list of directories prefixed to the>>> default module search path. The result is sys.path.>>> PYTHONHOME : alternate <prefix> directory (or>> <prefix>:<exec_prefix>).>>> The default module search path uses >>> <prefix>/python1.5.>>> [root at rocks14 birn-oracle1]#>>>>>>>> Tim Carlson wrote:>>> On Wed, 10 Dec 2003, V. Rowley wrote:>>>>>> Did you remove python by chance? kickstart.cgi calls python directly>> in>>> /usr/bin/python while rocks-dist does an "env python">>>>>> Tim>>>>>>>>>> Yep, I did that, but only *AFTER* getting the error. [Thought it >>>> was>>>> generated by the rocks-dist sequence, but apparently not.] Go >>>> ahead.>>>> Move it back. Same difference.>>>>>>>> Vicky>>>>>>>> Mason J. Katz wrote:>>>>>>>>> It looks like someone moved the profiles directory to >>>>> profiles.orig.>>>>>>>>>> -mjk>>>>>>>>>>>>>>> [root at rocks14 install]# ls -l>>>>> total 56>>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom>>>>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 >>>>> contrib.orig>>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07>>>>> ftp.rocksclusters.org>>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38>>>>> ftp.rocksclusters.org.orig>>>>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 >>>>> kickstart.cgi>>>>> drwxr-xr-x 3 root root 4096 Dec 10 20:38 >>>>> profiles.orig>>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist>>>>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38>> rocks-dist.orig>>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src>>>>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo>>>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:>>>>>>>>>>>>>>>> When I run this:>>>>>>

Page 178: 2003 December

>>>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;>>>>>> rocks-dist --dist=cdrom cdrom>>>>>>>>>>>> on a server installed with ROCKS 3.0.0, I eventually get this:>>>>>>>>>>>>>>>>>>> Cleaning distribution>>>>>>> Resolving versions (RPMs)>>>>>>> Resolving versions (SRPMs)>>>>>>> Adding support for rebuild distribution from source>>>>>>> Creating files (symbolic links - fast)>>>>>>> Creating symlinks to kickstart files>>>>>>> Fixing Comps Database>>>>>>> Generating hdlist (rpm database)>>>>>>> Patching second stage loader (eKV, partioning, ...)>>>>>>> patching "rocks-ekv" into distribution ...>>>>>>> patching "rocks-piece-pipe" into distribution ...>>>>>>> patching "PyXML" into distribution ...>>>>>>> patching "expat" into distribution ...>>>>>>> patching "rocks-pylib" into distribution ...>>>>>>> patching "MySQL-python" into distribution ...>>>>>>> patching "rocks-kickstart" into distribution ...>>>>>>> patching "rocks-kickstart-profiles" into distribution ...>>>>>>> patching "rocks-kickstart-dtds" into distribution ...>>>>>>> building CRAM filesystem ...>>>>>>> Cleaning distribution>>>>>>> Resolving versions (RPMs)>>>>>>> Resolving versions (SRPMs)>>>>>>> Creating symlinks to kickstart files>>>>>>> Generating hdlist (rpm database)>>>>>>> Segregating RPMs (rocks, non-rocks)>>>>>>> sh: ./kickstart.cgi: No such file or directory>>>>>>> sh: ./kickstart.cgi: No such file or directory>>>>>>> Traceback (innermost last):>>>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?>>>>>>> app.run()>>>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run>>>>>>> eval('self.command_%s()' % (command))>>>>>>> File "<string>", line 0, in ?>>>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom>>>>>>> builder.build()>>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build>>>>>>> (rocks, nonrocks) = self.segregateRPMS()>>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in>>>>>>> segregateRPMS>>>>>>> for pkg in ks.getSection('packages'):>>>>>>> TypeError: loop over non-sequence>>>>>>>>>>>>>>>>>> Any ideas?>>>>>>>>>>>> -->>>>>> Vicky Rowley email: vrowley at ucsd.edu>>>>>> Biomedical Informatics Research Network work: (858) 536-5980>>>>>> University of California, San Diego fax: (858) 822-0828>>>>>> 9500 Gilman Drive>>>>>> La Jolla, CA 92093-0715>>>>>>>>>>>>

Page 179: 2003 December

>>>>>> See pictures from our trip to China at>> http://www.sagacitech.com/Chinaweb>>>>>>>>>>>>>>>>>>> -->>>> Vicky Rowley email: vrowley at ucsd.edu>>>> Biomedical Informatics Research Network work: (858) 536-5980>>>> University of California, San Diego fax: (858) 822-0828>>>> 9500 Gilman Drive>>>> La Jolla, CA 92093-0715>>>>>>>>>>>> See pictures from our trip to China at>> http://www.sagacitech.com/Chinaweb>>>>>>>>>>>>>>>>>>>>>>>> -- >> Vicky Rowley email: vrowley at ucsd.edu>> Biomedical Informatics Research Network work: (858) 536-5980>> University of California, San Diego fax: (858) 822-0828>> 9500 Gilman Drive>> La Jolla, CA 92093-0715>>>>>> See pictures from our trip to China at>> http://www.sagacitech.com/Chinaweb>>>>>>>> --__--__-->>>> _______________________________________________>> npaci-rocks-discussion mailing list>> npaci-rocks-discussion at sdsc.edu>> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion>>>>>> End of npaci-rocks-discussion Digest>>>>>> DISCLAIMER:>> This email is confidential and may be privileged. If you are not the >> intended recipient, please delete it and notify us immediately. >> Please do not copy or use it for any purpose, or disclose its >> contents to any other person as it may be an offence under the >> Official Secrets Act. Thank you.>>>>> Glen Otero, Ph.D.> Linux Prophet> 619.917.1772>>

Page 180: 2003 December

Glen Otero, Ph.D.Linux Prophet619.917.1772

-------------- next part --------------A non-text attachment was scrubbed...Name: not availableType: text/enrichedSize: 35605 bytesDesc: not availableUrl : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031211/1a0b38fb/attachment-0001.bin

From tmartin at physics.ucsd.edu Fri Dec 12 10:26:58 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Fri, 12 Dec 2003 10:26:58 -0800Subject: [Rocks-Discuss]ftp.rocksclusters.org mirror?Message-ID: <[email protected]>

I was wondering, does the command rocks-dist do anything else besides call wget on the correct tree at ftp.rocksclusters.org?

I ask because some firewall restrictions on a system I am hesitant to fiddle are preventing me from running rocks-dist mirror from my head node. I would like to download the mirror of the rocks distro on another system, transfer the tree and then run rocks-dist dist to rebuild the rocks for my compute nodes. Is this reasonable?

Also am I going to run into any problems with rocks 3.0.0 having installed the head node on a UP system but my compute nodes are SMP? I am making an assumption that once I get all of the packages into rocks (currently there is no smp kernels on the head node) the compute nodes will install the right kernel?

BTW thanks for the help so far, the trick it seems to getting Rocks 3.0.0 on these supermicro systems is to install rocks on the hard drive in a separate computer and then install the hard disk.

Thanks,

Terrence

From mjk at sdsc.edu Fri Dec 12 10:48:17 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Fri, 12 Dec 2003 10:48:17 -0800Subject: [Rocks-Discuss]ftp.rocksclusters.org mirror?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

- Yes, "rocks-dist mirror" does a python system() call to run the wget application. It does this several time for the various directories it needs.

- No, the compute nodes do not need to be the same SMPness of the

Page 181: 2003 December

frontend. All installations are done with Red Hat Kickstart (plus our pixie dust) so hardware is auto detected for you. This is not disk imaging :)

-mjk

On Dec 12, 2003, at 10:26 AM, Terrence Martin wrote:

> I was wondering, does the command rocks-dist do anything else besides > call wget on the correct tree at ftp.rocksclusters.org?>> I ask because some firewall restrictions on a system I am hesitant to > fiddle are preventing me from running rocks-dist mirror from my head > node. I would like to download the mirror of the rocks distro on > another system, transfer the tree and then run rocks-dist dist to > rebuild the rocks for my compute nodes. Is this reasonable?>> Also am I going to run into any problems with rocks 3.0.0 having > installed the head node on a UP system but my compute nodes are SMP? I > am making an assumption that once I get all of the packages into rocks > (currently there is no smp kernels on the head node) the compute nodes > will install the right kernel?>> BTW thanks for the help so far, the trick it seems to getting Rocks > 3.0.0 on these supermicro systems is to install rocks on the hard > drive in a separate computer and then install the hard disk.>> Thanks,>> Terrence>>

From mjk at sdsc.edu Fri Dec 12 10:54:03 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Fri, 12 Dec 2003 10:54:03 -0800Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

We haven't done this for a while, and since our 3.0 release using different version of Red Hat for x86 and IA64 cross-building distribution may not work. 3.1.0 (since you are on campus you'll get a CD set from us next week) uses the same base RH for all architecture so this should be possible again.

The mirror should have worked:

# rocks-dist --arch=ia64 mirror

Should be the ia64 tree from ftp.rocksclusters.org, you can also use your IA64 DVD mount it on /mnt/cdrom and do a "rocks-dist copycd" to create the IA64 mirror.

If this works you will the to use the --genhdlist flag w/ rocks-dist.

Page 182: 2003 December

For example:

# cd /home/install# rocks-dist dist --- build the x86 distribution# rocks-dist --arch=ia64 --genhdlist=rocks-dist/.../i386/.../genhdlist

You'll need to use find to determine the path of the genhdlist executable in you x86 distribution. This may still fail (since RH version differ), but it does work when the version are the same for both archs.

-mjk

On Dec 11, 2003, at 2:29 PM, Edward O'Connor wrote:

> Hi everybody,>> I'm trying to bring up some ia64 compute nodes in a cluster with an > ia32> frontend. Normally, `cd /home/install; rocks-dist mirror dist` only > sets> up the frontend to handle ia32 compute nodes. I tried to manhandle> `rocks-dist mirror` into mirroring the ia64 stuff from> ftp.rocksclusters.org by giving it the --arch=ia64 option, but that> didn't work, so I went ahead and did the mirroring step by hand.>> After having done so, `rocks-dist dist` still doesn't do the right> thing. So, adding --arch=ia64 to that command yields this error output:>> ,----> | # rocks-dist --arch=ia64 dist> | Cleaning distribution> | Resolving versions (RPMs)> | Resolving versions (SRPMs)> | Adding support for rebuild distribution from source> | Creating files (symbolic links - fast)> | Creating symlinks to kickstart files> | Fixing Comps Database> | error - comps file is missing, skipping this step> | Generating hdlist (rpm database)> | error - could not find rpm anaconda-runtime> | error - could not find genhdlist> | Patching second stage loader (eKV, partioning, ...)> | error - could not find second stage, skipping this step> `---->> So my question is, what do I need to do to the ia32 frontend to enable> it to kickstart an ia64 compute node? Thanks.>>> Ted>> -- > Edward O'Connor> oconnor at ucsd.edu

From mjk at sdsc.edu Fri Dec 12 11:12:59 2003

Page 183: 2003 December

From: mjk at sdsc.edu (Mason J. Katz)Date: Fri, 12 Dec 2003 11:12:59 -0800Subject: [Rocks-Discuss]I can't use xpbs in rocksIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Unfortunately we don't have a fix here. We've moved to SGE (your can now use QMon). We do have a PBS roll but we plan to release 3.1 before the PBS roll is complete.

-mjk

On Dec 10, 2003, at 8:44 PM, zhong wenyu wrote:

> Hi,everyone!> I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of > them.> typed:xpbs[enter]> showed:xpbs: initialization failed! output: invalid command name > "Pref_Init"> thanks!>> _________________________________________________________________> ?????????????? MSN Messenger: http://messenger.msn.com/cn

From fparnold at chem.northwestern.edu Fri Dec 12 06:52:45 2003From: fparnold at chem.northwestern.edu (Fred P. Arnold)Date: Fri, 12 Dec 2003 08:52:45 -0600 (CST)Subject: [Rocks-Discuss]Gig E on HP ZX6000Message-ID: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu>

Hello,

I know this is a hardware question, not technically a Rocks one, but Ican't find the answer in my HP manuals:

On the ZX6000, there are two ethernet ports, a 10/100 basic/managementport, and a 1000 which is designated the primary interface.Unfortunately, rocks always identifies the 10/100 as eth0.

Does anyone know how to disable the 10/100 on a ZX6000? On an IA32, I'dgo into the bios, but these don't technically have one. We'd like to runours on a pure Gig network.

Thanks.

-Fred

Frederick P. Arnold, Jr.NUIT, Northwestern U.f-arnold at northwestern.edu

From mjk at sdsc.edu Fri Dec 12 11:16:42 2003From: mjk at sdsc.edu (Mason J. Katz)

Page 184: 2003 December

Date: Fri, 12 Dec 2003 11:16:42 -0800Subject: [Rocks-Discuss]ScalablePBS.In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

hi Roy,

This should become the basis of the PBS roll (currently openpbs). We are seeking developers who would like to help write and maintain this -- I'm not singling you out Roy, although you would be more than welcome, rather I'm taking advantage of your message to solicit other volunteers. Anyone?

-mjk

On Nov 21, 2003, at 2:52 PM, Roy Dragseth wrote:

> Hi folks.>> I've been testing ScalablePBS (SPBS) from supercluster.org for a few > weeks now> and it seems like a fairly good replacement for OpenPBS. Only a few > minor> changes to the OpenPBS infrastructure were needed to accomplish the> neccessary changes in the kickstart generation to make the nodes > switch to> SPBS.>> SPBS is based on OpenPBS 2.3.12, but incorporates most provided patches> (sandia etc) and is actively developed by the same maintainers that > develop> maui. It scales better than OpenPBS, to around 2K nodes, has better > fault> tolerance and communicates better with maui. It has, as far as I can > see, no> user visible changes from OpenPBS.>> I know, a lot of people are moving away from pbs and into sge, I was > thinking> about making the switch too. The emergence SPBS seems to make the > switch> unneccessary and I don't have to teach myself (and the users) a new > queueing> interface...>> Configuration tested:> Rocks 3.0.0> SPBS 1.0.1p0 (should leave beta phase next month)> Maui 3.2.6p6 (available for "Early Access Production")>> SPBS and Maui can be downloaded from http://www.supercluster.org/>> Have a nice weekend,> r.>> -- >

Page 185: 2003 December

> The Computer Center, University of Troms?, N-9037 TROMS?, Norway.> phone:+47 77 64 41 07, fax:+47 77 64 41 00> Roy Dragseth, High Performance Computing System Administrator> Direct call: +47 77 64 62 56. email: royd at cc.uit.no

From jlkaiser at fnal.gov Fri Dec 12 11:25:58 2003From: jlkaiser at fnal.gov (Joseph L. Kaiser)Date: Fri, 12 Dec 2003 13:25:58 -0600Subject: [Rocks-Discuss](no subject)Message-ID: <[email protected]>

My install of 3.0.0 is crapping out here:

"/usr/src/build/90289-i386/install// a x x usr/lib/anaconda/comps.py", line ax x 153, in __getitem__ ax x KeyError: PyXML #x x

Even though PyXML is in the distribution I have built. Is thereanything that can cause this other than the missing RPM?

Thanks,

Joe

From oconnor at soe.ucsd.edu Fri Dec 12 11:36:04 2003From: oconnor at soe.ucsd.edu (Edward O'Connor)Date: Fri, 12 Dec 2003 11:36:04 -0800Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends?In-Reply-To: <[email protected]> (Mason J. Katz's message of "Fri, 12 Dec 2003 10:54:03 -0800")References: <[email protected]>

<[email protected]> <[email protected]><[email protected]>

Message-ID: <[email protected]>

> We haven't done this for a while, and since our 3.0 release using> different version of Red Hat for x86 and IA64 cross-building> distribution may not work.

Ahh. After further travails (read below), I'm pretty willing to suspectthat this indeed does not work in Rocks 3.0.0. I'm looking forward tothose 3.1.0 CDs and DVDs next week! :)

> you can also use your IA64 DVD mount it on /mnt/cdrom and do a> "rocks-dist copycd" to create the IA64 mirror.

Unfortunately, the ia32 frontend machine doesn't have a DVD drive in it.So I mounted the ia64 ISO image on /mnt/cdrom via a loopback device andthat worked fine.

Page 186: 2003 December

However, `rocks-dist copycd` seemed to have nuked the ia32 stuff under/home/install/ftp.rocksclusters.org/, or, if it didn't entirely nuke it,it made the bare `rocks-dist dist` of your next instructions fail:

> If this works you will the to use the --genhdlist flag w/ rocks-dist.> For example:> > # cd /home/install> # rocks-dist dist --- build the x86 distribution

As this failed, I went ahead and also ran a `rocks-dist mirror`, whichproceeded to download a whole lot of stuff from you guys. After itfinished, `rocks-dist dist` completed without error. I double-checkedand the ia64 mirror from the `rocks-dist copycd` command still appearsto be there.

> # rocks-dist --arch=ia64 --genhdlist=rocks-dist/.../i386/.../genhdlist

Should there be a `dist` at the end of that? The above command (with thesubstitution of the appropriate genhdlist path) appears to be a no-op.So I appended a `dist` as the idea is for it to create the appropriatesymlinks for ia64 as well, and it bombs out too, in the same way asbefore:

,----| # rocks-dist --arch=ia64 --genhdlist=rocks-dist/7.3/en/os/i386/usr/lib/anaconda-runtime/genhdlist dist| Cleaning distribution| Resolving versions (RPMs)| Resolving versions (SRPMs)| Adding support for rebuild distribution from source| Creating files (symbolic links - fast)| Creating symlinks to kickstart files| Fixing Comps Database| error - comps file is missing, skipping this step| Generating hdlist (rpm database)| error creating file /home/install/rocks-dist/desktop/7.3/en/os/ia64/RedHat/base/hdlist: No such file or directory| Patching second stage loader (eKV, partioning, ...)| error - could not find second stage, skipping this step`----

> You'll need to use find to determine the path of the genhdlist> executable in you x86 distribution. This may still fail (since RH> version differ), but it does work when the version are the same for> both archs.

I suppose at this point that it's still failing due to the RH versionmismatch, and that getting this to work in 3.0.0 is a lost cause.

Ted

-- Edward O'Connoroconnor at ucsd.edu

Page 187: 2003 December

From jared_hodge at iat.utexas.edu Fri Dec 12 12:07:32 2003From: jared_hodge at iat.utexas.edu (Jared Hodge)Date: Fri, 12 Dec 2003 14:07:32 -0600Subject: [Rocks-Discuss]I can't use xpbs in rocksReferences: <[email protected]> <[email protected]>Message-ID: <[email protected]>

OK, I've got a fix for this one.The problem is that xpbs thinks that it's in the directory /var/tmp/OpenPBS-buildroot/opt/OpenPBS/Anyway, the path is mangled to get to some of the subroutines. The rocks guys can figure out a way to prevent this in future releases, but here's how you can get it working (and pbsmon while were at it):

First fix the scripts:/opt/OpenPBS/bin/xpbs Need's the following changes:

#set libdir /var/tmp/OpenPBS-buildroot/opt/OpenPBS/lib/xpbs#set appdefdir /var/tmp/OpenPBS-buildroot/opt/OpenPBS/lib/xpbsset libdir /opt/OpenPBS/lib/xpbsset appdefdir /opt/OpenPBS/lib/xpbs

/opt/OpenPBS/bin/xpbsmon Needs the same thing plus the first line needs changed

now do the following:cd /opt/OpenPBS/lib/xpbsrm tclIndex./buildindex `pwd`cd /opt/OpenPBS/lib/xpbsmonrm tclIndex./buildindex `pwd`

That should fix it all up. I tested this on a 2.3.2 cluster, I assume it's the same on 3.0.

-- Jared HodgeThe Institute for Advanced TechnologyThe University of Texas at Austin3925 W. Braker Lane, Suite 400Austin, Texas 78759

Phone: 512-232-4460Fax: 512-471-9096Email: jared_hodge at iat.utexas.edu

Mason J. Katz wrote:

> Unfortunately we don't have a fix here. We've moved to SGE (your can > now use QMon). We do have a PBS roll but we plan to release 3.1 > before the PBS roll is complete.>> -mjk

Page 188: 2003 December

>> On Dec 10, 2003, at 8:44 PM, zhong wenyu wrote:>>> Hi,everyone!>> I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of >> them.>> typed:xpbs[enter]>> showed:xpbs: initialization failed! output: invalid command name >> "Pref_Init">> thanks!>>>> _________________________________________________________________>> ?????????????? MSN Messenger: http://messenger.msn.com/cn>>

From jlkaiser at fnal.gov Fri Dec 12 14:39:42 2003From: jlkaiser at fnal.gov (Joe Kaiser)Date: Fri, 12 Dec 2003 16:39:42 -0600Subject: [Rocks-Discuss](no subject)In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Sorry, creating extra links where they don't belong. Nevermind.

On Fri, 2003-12-12 at 13:25, Joseph L. Kaiser wrote:> My install of 3.0.0 is crapping out here:> > "/usr/src/build/90289-i386/install// a x > x usr/lib/anaconda/comps.py", line a> x > x 153, in __getitem__ a> x > x KeyError: PyXML #> x > x > > > Even though PyXML is in the distribution I have built. Is there> anything that can cause this other than the missing RPM?> > Thanks,> > Joe-- ===================================================================Joe Kaiser - Systems Administrator

Fermi Lab CD/OSS-SCS Never laugh at live dragons.630-840-6444jlkaiser at fnal.gov ===================================================================

Page 189: 2003 December

From jholland at cs.uh.edu Fri Dec 12 14:52:10 2003From: jholland at cs.uh.edu (Jason Holland)Date: Fri, 12 Dec 2003 16:52:10 -0600 (CST)Subject: [Rocks-Discuss]Gig E on HP ZX6000In-Reply-To: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu>References: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu>Message-ID: <[email protected]>

Fred,

Try flipping the modules in /etc/modules.conf. Flip eth0 with eth1 sothat the gige interface comes up as eth0. Or, just turn off eth0altogether with 'alias eth0 off'. I think thats the right syntax.

We have 60 zx6000's and I have personally have never found a way todisable the port.

Jason P HollandTexas Learning and Computation Centerhttp://www.tlc2.uh.eduUniversity of HoustonPhilip G Hoffman Hall rm 207Atel: 713-743-4850

On Fri, 12 Dec 2003, Fred P. Arnold wrote:

> Hello,>> I know this is a hardware question, not technically a Rocks one, but I> can't find the answer in my HP manuals:>> On the ZX6000, there are two ethernet ports, a 10/100 basic/management> port, and a 1000 which is designated the primary interface.> Unfortunately, rocks always identifies the 10/100 as eth0.>> Does anyone know how to disable the 10/100 on a ZX6000? On an IA32, I'd> go into the bios, but these don't technically have one. We'd like to run> ours on a pure Gig network.>> Thanks.>> -Fred>> Frederick P. Arnold, Jr.> NUIT, Northwestern U.> f-arnold at northwestern.edu>

From jian at appro.com Fri Dec 12 17:27:51 2003From: jian at appro.com (Jian Chang)Date: Fri, 12 Dec 2003 17:27:51 -0800Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact ApproMessage-ID: <[email protected]>

Hello Mason / Puru, I got your contact information from Bryan Littlefield.

Page 190: 2003 December

I would like to discuss with you regarding benchmark test systems you might need down the road.We can also share with you our findings as to what is compatible in the Opteron systems.Please reply with your phone number where I can reach you, and I will call promptly. Bryan, Thank you for the referral. Best regards, Jian ChangRegional Sales Manager(408) 941-8100 x 202(800) 927-5464 x 202(408) 941-8111 Faxjian at appro.comwww.appro.com -----Original Message-----From: Bryan Littlefield [mailto:bryan at UCLAlumni.net]Sent: Tuesday, December 09, 2003 12:14 PMTo: npaci-rocks-discussion at sdsc.edu; mjk at sdsc.eduCc: Jian ChangSubject: Rocks-Discuss] AMD Opteron - Contact Appro Hi Mason,

I suggest contacting Appro. We are using Rocks on our Opteron cluster and Appro would likely love to help. I will contact them as well to see if they could help getting a opteron machine for testing. Contact info below:

Thanks --Bryan

Jian Chang - Regional Sales Manager(408) 941-8100 x 202(800) 927-5464 x 202(408) 941-8111 Faxjian at appro.com http://www.appro.com

npaci-rocks-discussion-request at sdsc.edu wrote:

From: "Mason J. Katz" <mailto:mjk at sdsc.edu> <mjk at sdsc.edu>Subject: Re: [Rocks-Discuss]AMD OpteronDate: Tue, 9 Dec 2003 07:28:51 -0800To: "purushotham komaravolu" <mailto:purikk at hotmail.com> <purikk at hotmail.com> We have a beta right now that we have sent to a few people. We plan on a release this month, and AMD_64 will be part of this release along with the usual x86, IA64 support. If you want to help accelerate this process please talk to your vendor about loaning/giving us some hardware for testing. Having access to a variety of Opteron hardware (we own two boxes) is the only way we can

Page 191: 2003 December

have good support for this chip. -mjk On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote: Cc: <mailto:npaci-rocks-discussion at sdsc.edu> <npaci-rocks-discussion at sdsc.edu>

Hello, I am a newbie to ROCKS cluster. I wanted to setup clusters on32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel andAMD).I found the 64-bit download for Intel on the website but not for AMD. Doesit work for AMD opteron? if not what is the ETA for AMD-64.We are planning to but AMD-64 bit machines shortly, and I would like tovolunteer for the beta testing if needed.ThanksRegards,Puru _______________________________________________npaci-rocks-discussion mailing listnpaci-rocks-discussion at sdsc.eduhttp://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion End of npaci-rocks-discussion Digest -------------- next part --------------An HTML attachment was scrubbed...URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031212/dec7e41b/attachment-0001.html

From landman at scalableinformatics.com Sat Dec 13 07:50:02 2003From: landman at scalableinformatics.com (Joe Landman)Date: Sat, 13 Dec 2003 10:50:02 -0500Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0Message-ID: <[email protected]>

Folks:

Finally built the 2.4.23 kernel into an RPM via the RedHat tools. Hadto hack up the spec file a bit, but you can see the results at

http://scalableinformatics.com/downloads/kernels/2.4.23/

These are 2.4.23 with the 2.4.24-pre1 patch (e.g. xfs is in there, woohoo!). I had to strip out most of the previous patches as they wereincompatible with .23 (and I don't want to spend time forward portingthem). The spec file, the sources, etc are released under the normallicenses (GPL). No warranties, use at your own risk, and these are NOT

Page 192: 2003 December

official Redhat kernels. Don't ask them for support for these, theywon't do it, and they will look at you funny.

That said, I had also checked out the cvs tree to start the "Carlson"process :) indicated in the list a few months ago (seehttps://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html)to build a more customized distribution. I got to the

Build the boot RPM

cd rocks/src/rocks/boot make rpm

point, and lo and behold this is what I see ...

rm version.mkrm archrm -f /local/rocks/src/rocks/boot/.rpmmacrosrm -f /usr/src/redhat/SOURCES/rocks-boot-3.1.0.tarrm -f /usr/src/redhat/SOURCES/rocks-boot-3.1.0.tar.gz...

Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customerhas a strong sense of urgency and little time to wait for an operationalcluster). I checked out the system from CVS earlier this week.

Is there any way to switch the build back to 3.0.0? Or am I really outof luck at this moment??? Clues/hints welcome.

These kernels might work, though I don't have a method to try them inthe distro yet. They work on the build machine.

[root at head root]# uname -aLinux head.public 2.4.23-1 #1 SMP Sat Dec 13 14:41:06 GMT 2003 i686

unknown

[root at head root]# rpm -qa | grep -i kernelkernel-2.4.23-1kernel-BOOT-2.4.23-1rocks-kernel-3.0.0-0pvfs-kernel-1.6.0-1kernel-doc-2.4.23-1kernel-source-2.4.23-1kernel-smp-2.4.23-1

The spec file is in the above download section, along with a .src.rpmand other stuff. If anyone does have a clue as to how to build with3.0.0 given the current cvs, or if there is a tagged set I needed toget, please let me know.

Joe

-- Joseph Landman, Ph.DScalable Informatics LLCemail: landman at scalableinformatics.com web: http://scalableinformatics.comphone: +1 734 612 4615

Page 193: 2003 December

From tim.carlson at pnl.gov Sat Dec 13 08:31:03 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Sat, 13 Dec 2003 08:31:03 -0800 (PST)Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0In-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Sat, 13 Dec 2003, Joe Landman wrote:

> That said, I had also checked out the cvs tree to start the "Carlson"> process :) indicated in the list a few months ago (see

yikes.. ! :)

>> Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer> has a strong sense of urgency and little time to wait for an operational> cluster). I checked out the system from CVS earlier this week.

You needed to check out the 3.0.0 tagged version

ROCKS_3_0_0_i386

Off thread, but it would seem to me that the numbering scheme for ROCKSgot out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3based and the new 3.1 will be RH 3.0 based. Not that it matters. Justcurious.

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From phil at sdsc.edu Sat Dec 13 08:51:29 2003From: phil at sdsc.edu (Philip Papadopoulos)Date: Sat, 13 Dec 2003 08:51:29 -0800Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Tim Carlson wrote:

>On Sat, 13 Dec 2003, Joe Landman wrote:>> >>>That said, I had also checked out the cvs tree to start the "Carlson"

Page 194: 2003 December

>>process :) indicated in the list a few months ago (see>> >>>>yikes.. ! :)>> >>>Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer>>has a strong sense of urgency and little time to wait for an operational>>cluster). I checked out the system from CVS earlier this week.>> >>>>You needed to check out the 3.0.0 tagged version>>ROCKS_3_0_0_i386>this is correct.

>>Off thread, but it would seem to me that the numbering scheme for ROCKS>got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new>3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3>based and the new 3.1 will be RH 3.0 based. Not that it matters. Just>curious.>I blame Bruno ...We moved to 3.0 because rolls is very different from the way 2.3.2 was put together -- thiswasn't a minor change and so a subminor revision number didn't make sense.

3.0 --> 3.1 change from 7.3 to recompiled RHEL, change from PBS as default toSGE as default. .... OK, you could argue that this is alsoa major change and shouldn't have a minor version #. We didn't want to go from 3.0 to 4.0 forsome non-definable reasons :-), but mostly it's that 3.0 and 3.1 feel pretty similar in terms of theway they are put together (with rolls).-P

>>Tim>>Tim Carlson>Voice: (509) 376 3423>Email: Tim.Carlson at pnl.gov>EMSL UNIX System Support> >-------------- next part --------------An HTML attachment was scrubbed...URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031213/69aa41fa/attachment-0001.html

From landman at scalableinformatics.com Sat Dec 13 11:14:51 2003From: landman at scalableinformatics.com (Joe Landman)Date: Sat, 13 Dec 2003 14:14:51 -0500

Page 195: 2003 December

Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Thanks. Magic incantations, and I have the "Carlson" processimplemented. Ok, next step is the roll-my-own ... more later

On Sat, 2003-12-13 at 11:31, Tim Carlson wrote:> On Sat, 13 Dec 2003, Joe Landman wrote:> > > That said, I had also checked out the cvs tree to start the "Carlson"> > process :) indicated in the list a few months ago (see> > yikes.. ! :)> > >> > Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer> > has a strong sense of urgency and little time to wait for an operational> > cluster). I checked out the system from CVS earlier this week.> > You needed to check out the 3.0.0 tagged version> > ROCKS_3_0_0_i386> > Off thread, but it would seem to me that the numbering scheme for ROCKS> got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new> 3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3> based and the new 3.1 will be RH 3.0 based. Not that it matters. Just> curious.> > Tim> > Tim Carlson> Voice: (509) 376 3423> Email: Tim.Carlson at pnl.gov> EMSL UNIX System Support

From wyzhong78 at msn.com Mon Dec 15 00:02:15 2003From: wyzhong78 at msn.com (zhong wenyu)Date: Mon, 15 Dec 2003 16:02:15 +0800Subject: [Rocks-Discuss]about add-extra-nicMessage-ID: <[email protected]>

Hi,everyone!my compute node'mb is msi 9141,on which there are one 1000M nic and one 100M nic.I plan to use 100M net to control and 1000M for application.so I use 100M switch to connect compute nodes from frontend,a 1000M switch to connect compute nodes each other not include frontend. at my first time to install the compute node,i found it "waiting for dhcp ip information" to long,and ican not finish a installing.i think the 1000M nic must answer for it.so i disabled it in BIOS.after that,the installing worked,the compute nodes appeared. then i want to add the extar nic.i use the command and shoo-node,the compute node go to rebooting(between the rebooting i enabled the nic) ,and come into "waiting for dhcp ip information" again.

Page 196: 2003 December

so i disabled it again and restart, the node reinstall all right, finished with no trouble.even i can see the boot message "start eth1....[ok]"!but i can only found error by use "ifconfig eth1" even after i enable the 1000M nic again!thanks and regards!

_________________________________________________________________?????????????? MSN Messenger: http://messenger.msn.com/cn

From Roy.Dragseth at cc.uit.no Mon Dec 15 02:31:51 2003From: Roy.Dragseth at cc.uit.no (Roy Dragseth)Date: Mon, 15 Dec 2003 11:31:51 +0100Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Hi.

I've been running a setup like this for something like this for over a year now, it will not (ever?) work right out of the box due to some kernel problems.

rocks-dist --arch ia64 dist

will most likely crash a ia32 frontend. The ia32 kernel doesn't like to mount a cramfs image generated on a ia64 machine, it gives me a kernel panic.

Here is a rough guide to get this kind of setup going.

1. Setup the ia32 as usual, but allow root write access to /export by inserting "no_root_squash" as an option in /etc/exports.

2. create a "fake" ia64 frontend using one of the ia64 nodes, let it configure eth0 by dhcp an let the ia32 frontend think it is a compute node.

3. on the fake frontend you turn off the nis daemons except ypbind.

4. edit /etc/auto.home to mount /home from the ia32 frontend and restart autofs.

5. on the fake frontend you do a rocks-dist copycd to dump the ia64 dvd into the /home/install.

6. Now you can do a rocks-dist dist on the ia64 box.

7. At last you need a symlink to make the ia32 frontend happy:ln -s enterprise/2.1AW/en/os/ia64 rocks-dist/7.3/en/os/ia64

Now you can boot up your ia64 nodes from the ia32 frontend.After you are confident that your ia64 nodes are installed correctly you can reinstall the ia64 frontend as a regular compute node. Subsequent rocks-dist dist can be run on any ia64 compute node as long as it has the anaconda-runtime and rocks-dist rpms installed.

Hope this helps,

Page 197: 2003 December

r.

--

The Computer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00

Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd at cc.uit.no

From Roy.Dragseth at cc.uit.no Mon Dec 15 04:28:15 2003From: Roy.Dragseth at cc.uit.no (Roy Dragseth)Date: Mon, 15 Dec 2003 13:28:15 +0100Subject: [Rocks-Discuss]Gig E on HP ZX6000In-Reply-To: <[email protected]>References: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu> <[email protected]>Message-ID: <[email protected]>

I had similar problems on our HP rx2600 boxes and found a way to make the kernel ignore the 100Mb/s NIC by adding this append line in elilo.conf:

append="reserve=0xd00,64"

See my post https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003483.html

for details on how to figure out this parameter.

Remark: this has to be modified both in the elilo.conf and elilo-ks.conf in /boot/efi/efi/redhat/. The problem is that cluster-kickstart overwrites these files at every reboot and the setup is hardcoded into the cluster-kickstart executable so you need to figure out a way to work around this. I grabbed cluster-kickstart.c from cvs, did the neccessary mods to it and installed the new one on every compute node.

r.

--

The Computer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00

Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd at cc.uit.no

From fds at sdsc.edu Mon Dec 15 11:31:01 2003From: fds at sdsc.edu (Federico Sacerdoti)Date: Mon, 15 Dec 2003 11:31:01 -0800Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

We did indeed move version scheming. We used to be "Redhat minus 5" so a RH 7.3-based Rocks was called 2.3.x. This became mute when Redhat

Page 198: 2003 December

quickly went from 8 to 9 to Enterprise 3. So we decided to be selfish and move to 3.0.0 when we made a big internal change (Rolls and the end of monolithic Rocks).

3.1.0 is an minor number revision, which corresponds to how much has changed in the Rocks code, not the underlying Redhat system. A bugfix release would be 3.1.1, etc...

We hope this versioning scheme will be more resilient to linux system changes (which are out of our control), while keeping the focus on the Rocks structure.

On Dec 13, 2003, at 8:31 AM, Tim Carlson wrote:

> Off thread, but it would seem to me that the numbering scheme for ROCKS> got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new> 3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH > 7.3> based and the new 3.1 will be RH 3.0 based. Not that it matters. Just> curious.>Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

From jlkaiser at fnal.gov Mon Dec 15 11:43:43 2003From: jlkaiser at fnal.gov (Joseph L. Kaiser)Date: Mon, 15 Dec 2003 13:43:43 -0600Subject: [Rocks-Discuss]problem forcing a kernelMessage-ID: <[email protected]>

Hi,

I am trying to install this kernel:

kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the followingwhether I put it in the force directory of my distro or the regular RPMSdirectory or contrib:

During package installation it gives me this:

/mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot beopened. This is due to a missing file, a bad package, or bad media. Press <return> to try again.

The file is there. The media is the network. I have installed thepackage on other systems by hand. Any ideas?

Thanks,

Joe

Page 199: 2003 December

From tmartin at physics.ucsd.edu Mon Dec 15 15:58:51 2003From: tmartin at physics.ucsd.edu (Terrence Martin)Date: Mon, 15 Dec 2003 15:58:51 -0800Subject: [Rocks-Discuss]removing a node from the clusterMessage-ID: <[email protected]>

How does one go about removing a node from the cluster? Is there a straight forward way to do this?

Terrence

From ebpeele2 at pams.ncsu.edu Mon Dec 15 16:42:47 2003From: ebpeele2 at pams.ncsu.edu (Elliot Peele)Date: Mon, 15 Dec 2003 19:42:47 -0500Subject: [Rocks-Discuss]removing a node from the clusterIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

insert-ethers --replace hostname

Select compute from the menu then exit insert-ethers.

Elliot

On Mon, 2003-12-15 at 18:58, Terrence Martin wrote:> How does one go about removing a node from the cluster? Is there a > straight forward way to do this?> > Terrence-------------- next part --------------A non-text attachment was scrubbed...Name: not availableType: application/pgp-signatureSize: 189 bytesDesc: This is a digitally signed message partUrl : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031215/ebf9581b/attachment-0001.bin

From phil at sdsc.edu Mon Dec 15 16:44:29 2003From: phil at sdsc.edu (Philip Papadopoulos)Date: Mon, 15 Dec 2003 16:44:29 -0800Subject: [Rocks-Discuss]removing a node from the clusterIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

insert-ethers --replace "compute-0-0"select "compute" from the menuand then hit f1 to exit.

This will re-create all of the files that have host names and removethe node (you are essentially replacing the node named "compute-0-0" withthe empty set).

PBS will likely be unhappy with this change -- If I remember correctly, it has an

Page 200: 2003 December

additional file that it creates when a node is added to the queuing system -- when thenode doesn't appear in the host table, it gets cranky. You should look in/opt/OpenPBS/server_priv/nodes to solve this problem -- suppose you want to deletecompute-0-0.

# qmgr -c "delete node compute-0-0"# insert-ethers --replace "compute-0-0"

-P

Terrence Martin wrote:

> How does one go about removing a node from the cluster? Is there a > straight forward way to do this?>> Terrence

-- == Philip Papadopoulos, Ph.D. == Program Director for San Diego Supercomputer Center == Grid and Cluster Computing 9500 Gilman Drive== Ph: (858) 822-3628 University of California, San Diego== FAX: (858) 822-5407 La Jolla, CA 92093-0505

From gotero at linuxprophet.com Mon Dec 15 16:52:23 2003From: gotero at linuxprophet.com (Glen Otero)Date: Mon, 15 Dec 2003 16:52:23 -0800Subject: [Rocks-Discuss]removing a node from the clusterIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Dec 15, 2003, at 4:42 PM, Elliot Peele wrote:

> insert-ethers --replace hostname>> Select compute from the menu then exit insert-ethers.

Then run:

# insert-ethers --update

to update the database

Check the database entries with:

Page 201: 2003 December

# dbreport hosts

Glen

>> Elliot>> On Mon, 2003-12-15 at 18:58, Terrence Martin wrote:>> How does one go about removing a node from the cluster? Is there a>> straight forward way to do this?>>>> Terrence>>Glen Otero, Ph.D.Linux Prophet619.917.1772

From landman at scalableinformatics.com Mon Dec 15 17:13:29 2003From: landman at scalableinformatics.com (Joe Landman)Date: Mon, 15 Dec 2003 20:13:29 -0500Subject: [Rocks-Discuss]removing a node from the clusterIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Harumph:

rmnode nasty_compute_node insert-ethers --update

(rmnode at http://scalableinformatics.com/downloads/rmnode.gz).

I thought insert-ethers had a simple version of this. All rmnode is, is a hacked version of one of the other rocks tools.

Joe

Glen Otero wrote:

>> On Dec 15, 2003, at 4:42 PM, Elliot Peele wrote:>>> insert-ethers --replace hostname>>>> Select compute from the menu then exit insert-ethers.>>> Then run:>> # insert-ethers --update>> to update the database>

Page 202: 2003 December

> Check the database entries with:>> # dbreport hosts>> Glen>>>>> Elliot>>>> On Mon, 2003-12-15 at 18:58, Terrence Martin wrote:>>>>> How does one go about removing a node from the cluster? Is there a>>> straight forward way to do this?>>>>>> Terrence>>>> Glen Otero, Ph.D.> Linux Prophet> 619.917.1772

-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

From csamuel at vpac.org Mon Dec 15 18:06:47 2003From: csamuel at vpac.org (Chris Samuel)Date: Tue, 16 Dec 2003 13:06:47 +1100Subject: [Rocks-Discuss]ScalablePBS.In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

-----BEGIN PGP SIGNED MESSAGE-----Hash: SHA1

On Sat, 13 Dec 2003 06:16 am, Mason J. Katz wrote:

> This should become the basis of the PBS roll (currently openpbs). We> are seeking developers who would like to help write and maintain this> -- I'm not singling you out Roy, although you would be more than> welcome, rather I'm taking advantage of your message to solicit other> volunteers. Anyone?

I think we might be interested in getting involved with this, we migrated from OpenPBS to ScalablePBS some time ago and spent quite a bit of time tracking down memory leaks and the like with DJ and friends at SuperCluster.

We've also started using Rocks on a cluster that we manage for one of our member institutions and a colleague of mine is having fun trying to get it to go onto an Itanium cluster at the moment plus we should have some Opteron boxes arriving in a month or so for a mini-cluster which we'd like to run

Page 203: 2003 December

Rocks on.

Currently we install Rocks on the cluster and then remove PBS and MAUI RPM's and install SPBS and the 3.2.6 version of MAUI we have access to, so a version that came with SPBS ready to go would make life a lot simpler for us. :-)

cheers!Chris- -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/3mi3O2KABBYQAh8RAuSLAJ9Bx/5aCF8kRjHFapUpiASQUJeCTwCcD9y7Y/ZM38t0J8r5dAYj1MdiUWA==bCIS-----END PGP SIGNATURE-----

From bruno at rocksclusters.org Mon Dec 15 18:30:03 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 15 Dec 2003 18:30:03 -0800Subject: [Rocks-Discuss]removing a node from the clusterIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

> Harumph:>> rmnode nasty_compute_node> insert-ethers --update>> (rmnode at http://scalableinformatics.com/downloads/rmnode.gz).>> I thought insert-ethers had a simple version of this. All rmnode is, > is a hacked version of one of the other rocks tools.

actually, since v3.0.0, i think it does:

http://www.rocksclusters.org/rocks-documentation/3.0.0/faq- configuration.html#REMOVE-NODE

- gb

From bruno at rocksclusters.org Mon Dec 15 19:40:49 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 15 Dec 2003 19:40:49 -0800Subject: [Rocks-Discuss]problem forcing a kernelIn-Reply-To: <[email protected]>

Page 204: 2003 December

References: <[email protected]>Message-ID: <[email protected]>

> I am trying to install this kernel:>> kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the following> whether I put it in the force directory of my distro or the regular > RPMS> directory or contrib:>> During package installation it gives me this:>>> /mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot be> opened. This is due to a missing file, a bad package, or bad media.> Press <return> to try again.>>> The file is there. The media is the network. I have installed the> package on other systems by hand. Any ideas?

just to be sure, do you run the following after you copy the RPM into the force directory:

# cd /home/install# rocks-dist dist

- gb

From bruno at rocksclusters.org Mon Dec 15 19:56:51 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 15 Dec 2003 19:56:51 -0800Subject: [Rocks-Discuss]Adding partitions that are not reformatted under hard boots or shoot-nodeIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

sorry for the late response.

i recently tested the manual partitioning procedure on our upcoming release and there was a bug. a fix has been committed for the next release -- so manual partitioning will work on 3.1.0 as explained in the 3.0.0 documentation.

- gb

On Dec 9, 2003, at 6:55 PM, Jorge L. Rodriguez wrote:

> Hi,>> How do I add an extra partition to my compute nodes and retain the > data on all non / partitions when system hard boots or is shot?> I tried the suggestion in the documentation under "Customizing your > ROCKS Installation" where you replace the auto-partition.xml but hard > boots or shoot-nodes on these reformat all partitions instead of just

Page 205: 2003 December

> the /. I have also tried to modify the installclass.xml so that an > extra partition is added into the python code see below. This does > mostly what I want but now I can't shoot-node even though a hard boot > reinstalls without reformatting all but /. Is this the right approach? > I'd rather avoid having to replace installclass since I don't really > want to partition all nodes this way but if I must I will.>> Jorge>> #> # set up the root partition> #> args = [ "/" , "--size" , "4096",> "--fstype", "&fstype;",> "--ondisk", devnames[0] ]> KickstartBase.definePartition(self, id, args)>> # ---- Jorge, I added this args> args = [ "/state/partition1" , "--size" , > "55000",> "--fstype", "&fstype;",> "--ondisk", devnames[0] ]> KickstartBase.definePartition(self, id, args)> # -----> args = [ "swap" , "--size" , "1000",> "--ondisk", devnames[0] ]> KickstartBase.definePartition(self, id, args)>> #> # greedy partitioning> #> # ----- Jorge, I change this from i = 1> i = 2> # -----> for devname in devnames:> partname = "/state/partition%d" % (i)> args = [ partname, "--size", "1",> "--fstype", "&fstype;",> "--grow", "--ondisk", devname ]> KickstartBase.definePartition(self, id, > args)>> i = i + 1>>>

From jlkaiser at fnal.gov Mon Dec 15 20:17:52 2003From: jlkaiser at fnal.gov (Joseph L. Kaiser)Date: Mon, 15 Dec 2003 22:17:52 -0600Subject: [Rocks-Discuss]problem forcing a kernelIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

yup

Page 206: 2003 December

On Mon, 2003-12-15 at 21:40, Greg Bruno wrote:> > I am trying to install this kernel:> >> > kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the following> > whether I put it in the force directory of my distro or the regular > > RPMS> > directory or contrib:> >> > During package installation it gives me this:> >> >> > /mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot be> > opened. This is due to a missing file, a bad package, or bad media.> > Press <return> to try again.> >> >> > The file is there. The media is the network. I have installed the> > package on other systems by hand. Any ideas?> > just to be sure, do you run the following after you copy the RPM into > the force directory:> > # cd /home/install> # rocks-dist dist> > - gb>

From Roy.Dragseth at cc.uit.no Tue Dec 16 02:13:50 2003From: Roy.Dragseth at cc.uit.no (Roy Dragseth)Date: Tue, 16 Dec 2003 11:13:50 +0100Subject: [Rocks-Discuss]ScalablePBS.In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Friday 12 December 2003 20:16, Mason J. Katz wrote:> This should become the basis of the PBS roll (currently openpbs). We > are seeking developers who would like to help write and maintain this > -- I'm not singling you out Roy, although you would be more than > welcome, rather I'm taking advantage of your message to solicit other > volunteers. Anyone?>

I talked to my boss and he gave me thumbs up, so I'll be glad to take care of the Maui/PBS roll of rocks.

I'd love to see some more hands in the air as maintainers/testers...

r.

--

The Computer Center, University of Troms?, N-9037 TROMS? Norway.

Page 207: 2003 December

phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator

Direct call: +47 77 64 62 56. email: royd at cc.uit.no

From daniel.kidger at quadrics.com Tue Dec 16 07:08:44 2003From: daniel.kidger at quadrics.com (Dan Kidger)Date: Tue, 16 Dec 2003 15:08:44 +0000Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)In-Reply-To: <20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>References: <20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>Message-ID: <[email protected]>

Glen et al.

>I recently had the same problem when building a quadrics cluster on Rocks 2.3.2>with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The problem is>definitely in the naming of the rpms, in that anaconda running on the compute>nodes is not going to recognize kernel rpms that begin with 'qsnet' as potential>boot options. Unfortunately, being under a severe time contraint, I resorted to>manually installing the qsnet kernel on all nodes of the cluster, which isn't>the Rocks way. The long term solution is to mangle the kernel makefiles so that>the qsnet kernel rpms have conventional kernel rpm names, which is what Greg's>post referred to.

I have been thinking about this.

I reckon that the long term solution is *not* to rename the kernel that we use. (nor indeed to change the naming convention of any other kernels that people want to work on). As well as the triplet version numbering and the architecture, the kernel naming that we use includes the kernel source tree (Redhat, Suse, LSY, Vanilia, ..) and our partch level version numering triplet. Quadrics cannot be the only people who need freedom to include extra information in our naming convention for kernels.The solution must lie in either annaconda itself or more likely a cleaner way to include extra kernel(s) as well as the stock one in the compute node install process. Using extend-nodes.xml this works apart from niggles about the /boot/grub/menu.lst that our kernel post-instal;l configures getting clobbered by Rocks.

Yours,Daniel.

gotero at linuxprophet.com wrote:

>Daniel->> >

-- Yours,Daniel.

Page 208: 2003 December

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505----------------------- www.quadrics.com --------------------

From mjk at sdsc.edu Tue Dec 16 07:09:56 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 16 Dec 2003 07:09:56 -0800Subject: [Rocks-Discuss]ScalablePBS.In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Fanstastic! I think this puts us at three people that have volunteered to help out on this. I will followup on this and help organize, support, and do some of the development also. But I'm going to push this back until after we get 3.1 out which looks like monday.

-mjk

On Dec 16, 2003, at 2:13 AM, Roy Dragseth wrote:

> On Friday 12 December 2003 20:16, Mason J. Katz wrote:>> This should become the basis of the PBS roll (currently openpbs). We>> are seeking developers who would like to help write and maintain this>> -- I'm not singling you out Roy, although you would be more than >> welcome, rather I'm taking advantage of your message to solicit other>> volunteers. Anyone?>>>> I talked to my boss and he gave me thumbs up, so I'll be glad to take > care of> the Maui/PBS roll of rocks.>> I'd love to see some more hands in the air as maintainers/testers...>> r.>>> -- >> The Computer Center, University of Troms?, N-9037 TROMS? Norway.> phone:+47 77 64 41 07, fax:+47 77 64 41 00> Roy Dragseth, High Performance Computing System Administrator> Direct call: +47 77 64 62 56. email: royd at cc.uit.no

From mjk at sdsc.edu Tue Dec 16 07:37:04 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 16 Dec 2003 07:37:04 -0800Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)In-Reply-To: <[email protected]>References:

Page 209: 2003 December

<20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net> <[email protected]>Message-ID: <[email protected]>

If you rename the linux kernel to include other arbitrary strings the RedHat Kickstart installer will not recognize it as a kernel. This means you loose probing for the correct x86 cpu (386/486/585/686) and probing for SMP vs. uni. This implies you would need to re-write the anaconda code to do this for arbitrarily named packages, if you could convince RedHat to do this great, but it's not worth our development time to do this ourselves when properly named kernel packages work wonderfully. The unfortunate reality is the kernel RPM is not just another package -- it has some special installation logic to optimize for you hardware. Sure they could have done this better, but they do a darn good job as is.

This is not a Rocks issue, it means you have created a package that does not work with RedHat. I understand why you need to include extra strings in the kernel name, but suggest that there are several alternatives to this that don't break RedHat kickstart. For example, you could:

- Write a kernel version module to report on /proc/qsnet_kernel the same information.

- Have the kernel RPM install a /usr/doc/qsnet/VERSION file

- Have a subpackage of the kernel rpm that include the extra strings (and extra docs).

- Stop patching the kernel and only use a module. True some things require kernel patches, but almost all driver changes can go into modules only. This was not always true a few years ago, the module system has improved a lot.

We've faced numerous issues like this with RedHat in creating Rocks, and for every issue we have found a work around that keeps us w/in the RedHat way of doing things. This is not always optimal for development but always yields a simpler, and more supportable, system.

-mjk

On Dec 16, 2003, at 7:08 AM, Dan Kidger wrote:

> Glen et al.>>> I recently had the same problem when building a quadrics cluster on >> Rocks 2.3.2>> with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The >> problem is>> definitely in the naming of the rpms, in that anaconda running on the >> compute>> nodes is not going to recognize kernel rpms that begin with 'qsnet' >> as potential>> boot options. Unfortunately, being under a severe time contraint, I >> resorted to>> manually installing the qsnet kernel on all nodes of the cluster, >> which isn't

Page 210: 2003 December

>> the Rocks way. The long term solution is to mangle the kernel >> makefiles so that>> the qsnet kernel rpms have conventional kernel rpm names, which is >> what Greg's>> post referred to.>> I have been thinking about this.>> I reckon that the long term solution is *not* to rename the kernel > that we use. (nor indeed to change the naming convention of any other > kernels that people want to work on). As well as the triplet version > numbering and the architecture, the kernel naming that we use includes > the kernel source tree (Redhat, Suse, LSY, Vanilia, ..) and our partch > level version numering triplet.> Quadrics cannot be the only people who need freedom to include extra > information in our naming convention for kernels.> The solution must lie in either annaconda itself or more likely a > cleaner way to include extra kernel(s) as well as the stock one in the > compute node install process. Using extend-nodes.xml this works apart > from niggles about the /boot/grub/menu.lst that our kernel > post-instal;l configures getting clobbered by Rocks.>> Yours,> Daniel.>>> gotero at linuxprophet.com wrote:>>> Daniel->>>>>> -- > Yours,> Daniel.>> --------------------------------------------------------------> Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com> One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505> ----------------------- www.quadrics.com -------------------->

From dtwright at uiuc.edu Tue Dec 16 11:45:55 2003From: dtwright at uiuc.edu (Dan Wright)Date: Tue, 16 Dec 2003 13:45:55 -0600Subject: [Rocks-Discuss]a minor ganglia questionMessage-ID: <[email protected]>

Hello all,

I'm in the process of setting up a 3.0.0 cluster and have a question about the"Physical view" in ganglia. In this view (which is quite cool BTW :) is showshigher-numbered nodes on top and lower-numbered nodes on bottom:

compute-0-12...compute-0-2

Page 211: 2003 December

compute-0-1compute-0-0

and my cluster is physically reversed from that:

compute-0-0compute-0-1compute-0-2...compute-0-12

Is there an easy way to switch this display around so it matches the realphysical layout? I poked around and ganglia for a few minutes and didn't seeanything obvious, so I thought I'd ask before I actually start wasting time onthis :)

Thanks,

- Dan Wright(dtwright at uiuc.edu)(http://www.scs.uiuc.edu/)(UNIX Systems Administrator, School of Chemical Sciences, UIUC)(333-1728)-------------- next part --------------A non-text attachment was scrubbed...Name: not availableType: application/pgp-signatureSize: 189 bytesDesc: not availableUrl : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031216/28f3eb5a/attachment-0001.bin

From purikk at hotmail.com Tue Dec 16 12:34:51 2003From: purikk at hotmail.com (Purushotham Komaravolu)Date: Tue, 16 Dec 2003 15:34:51 -0500Subject: [Rocks-Discuss]hardware-setup for the Rocks clusterReferences: <[email protected]>Message-ID: <[email protected]>

Hi All, We are trying to setup rocks cluster with 1 front and 20 computingnodes.Frontend: 1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache 2) Dual port Gigabit Ethernet 3) 1 GB DDR RAM 4) 3* 200 GB EIDE ULTRA ATA 100

Compute nodes: 1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache 2) Dual port Gigabit Ethernet 3) 1 GB DDR RAM 4) 41 GB UDMA EIDE1 HP Procurve 24 port switch

Does the setup look ok?

Page 212: 2003 December

Does Rocks support the following featuresRemote power monitoring for individual nodes

*Temperature monitoring of individual processors

*Power sequencing on startup to prevent possible power spiking

*Remote power-down and reset of system and nodes

*Serial access to nodes

*Disk cloning

*Plug-In Extensible Architecture

*Image Manager

and also

How should be the disk setup, does all the disks need to be attached tofrontend and compute nodes have small 3 or 4 GB disks?

Can someone point me to a clustering software which supports all abovefeatures if Rocks does'nt support them.

thanks a lot

Regards,

Puru

From purikk at hotmail.com Tue Dec 16 12:39:19 2003From: purikk at hotmail.com (Purushotham Komaravolu)Date: Tue, 16 Dec 2003 15:39:19 -0500Subject: [Rocks-Discuss]Java Rocks clusterMessage-ID: <[email protected]>

I am a newbie to ROCKSI have a question about running Java on a Rockster. Is it possible that I can start only one JVM on one machine and the task be run distributed on the cluster? It is a multi-threaded application. Like say, I have an application with 100 threads. Can I have 50 threads run on one machine and 50 on another by launching the application(jvm) on one machine?(similar to SUN Firebird) I dont want to use MPI or any special code.ThanksSincerely Puru-------------- next part --------------An HTML attachment was scrubbed...URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031216/ee12ac80/attachment-0001.html

From mjk at sdsc.edu Tue Dec 16 13:20:24 2003

Page 213: 2003 December

From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 16 Dec 2003 13:20:24 -0800Subject: [Rocks-Discuss]Java Rocks clusterIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

There are a few research projects that do map java threads onto cluster compute nodes processes. At the IEEE Cluster '03 conference a couple weeks ago in Hong Kong there were a few interesting Java talks on this subject. You can see the schedule at the following link and do some google research for more info. I think the papers will be online soon...

http://www.csis.hku.hk/cluster2003/advance-program.html

Rocks 3.1 will include a Java Roll, but this is nothing more than Sun's Java sdk/rte and doesn't do any cluster magic for you.

-mjk

On Dec 16, 2003, at 12:39 PM, Purushotham Komaravolu wrote:

> I am a newbie to ROCKS> I have a question about running Java on a?Rockster.> ?Is it possible that I can start only one JVM on one machine and the> ?task?be run distributed on the cluster? It is a multi-threaded > application.?> Like say, I have an application with 100 threads.?Can I have 50 > threads run on one machine and 50 on another by?launching the > application(jvm) on one machine?(similar to SUN Firebird)?I dont want > to use MPI or any?special code.> Thanks> Sincerely> Puru

From phil at sdsc.edu Tue Dec 16 13:38:48 2003From: phil at sdsc.edu (Philip Papadopoulos)Date: Tue, 16 Dec 2003 13:38:48 -0800Subject: [Rocks-Discuss]hardware-setup for the Rocks clusterIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Purushotham Komaravolu wrote:

>Hi All,> We are trying to setup rocks cluster with 1 front and 20 computing>nodes.>Frontend:> 1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache> 2) Dual port Gigabit Ethernet> 3) 1 GB DDR RAM> 4) 3* 200 GB EIDE ULTRA ATA 100

Page 214: 2003 December

>>Compute nodes:> 1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache> 2) Dual port Gigabit Ethernet> 3) 1 GB DDR RAM> 4) 41 GB UDMA EIDE>1 HP Procurve 24 port switch>>>Does the setup look ok?>Setup looks fine.

>>>Does Rocks support the following features>Remote power monitoring for individual nodes>>*Temperature monitoring of individual processors>Not directly -- there isn't a completely general solution to this -- though lmsensorsis good for non-server boards. However, nothing prevents you from adding theproper software. It's fairly easy to add metrics to ganglia if you have the baseline driversfor your particular temp monitoring sw.

>>*Power sequencing on startup to prevent possible power spiking>>*Remote power-down and reset of system and nodes>>*Serial access to nodes>All of these generally require another network (serial, lights-out management, etc).We don't assume any of these extra networks exist. Again, layering that functionalitya top of rocks is very very straightforward. See the FAQ for how to add packages to nodes.

>>*Disk cloning>No. Emphatically No. Disk cloning is not anywhere in the rocks vocabulary.We have distributions (Redhat + Rocks + Cluster tools + your own software) anda way to generate a kickstart file in a programatic way. Disk cloning assumes homogeneityof hardware (we don't), requires a custom after market installer to fix up a node afteran image is put on a node (we use Redhat as the installer), requires a completely differentimage for every different functional type of node (frontend, compute, nfs, web, pvfs, etc).

>>*Plug-In Extensible Architecture

Page 215: 2003 December

>Uh. Yeah. That's the whole point. Again see the FAQ of how you add packages.Rolls is an additional extension mechanism that allows you to add larger chunks of functionalityat Cluster build time. We extend base rocks with Grid Software, Schedulers, Java, andcommunity-specific software stacks. You should wait (about 5 days) for the finalrelease of 3.1.0 to see how rolls works.

>>*Image Manager>Definitely No. There are no images in Rocks. We have distributions and appliance types.A graph description of appliances is melded with distributions to define a completenode. Shared configuration is truly shared. None of that happens with images -- the basesoftware and the configuration are locked together.

>>and also>>How should be the disk setup, does all the disks need to be attached to>frontend and compute nodes have small 3 or 4 GB disks?>Nodes must be disk full. Type and size (8GB is probably minimal given the size of Linux thesedays). You can put as many disks as you want on your frontend and have it double asan NFS server for your cluster (default). You can build other NFS servers easily (and managethem as easily as you do a compute node).

>>Can someone point me to a clustering software which supports all above>features if Rocks does'nt support them.>Sorry. Doesn't exist. Pick the things that you can live without today (but wouldwant to add tomorrow).

-P

>>thanks a lot>>Regards,>>Puru>>>>> >

Page 216: 2003 December

-- == Philip Papadopoulos, Ph.D. == Program Director for San Diego Supercomputer Center == Grid and Cluster Computing 9500 Gilman Drive== Ph: (858) 822-3628 University of California, San Diego== FAX: (858) 822-5407 La Jolla, CA 92093-0505

From mjk at sdsc.edu Tue Dec 16 13:38:59 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 16 Dec 2003 13:38:59 -0800Subject: [Rocks-Discuss]hardware-setup for the Rocks clusterIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Dec 16, 2003, at 12:34 PM, Purushotham Komaravolu wrote:

> Hi All,> We are trying to setup rocks cluster with 1 front and 20 > computing> nodes.> Frontend:> 1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache> 2) Dual port Gigabit Ethernet> 3) 1 GB DDR RAM> 4) 3* 200 GB EIDE ULTRA ATA 100>> Compute nodes:> 1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache> 2) Dual port Gigabit Ethernet> 3) 1 GB DDR RAM> 4) 41 GB UDMA EIDE> 1 HP Procurve 24 port switch>>> Does the setup look ok?

Sounds good, if you have device driver issues just wait until next week when 3.1 comes out, this will have a new kernel and more supported hardware.

> Does Rocks support the following features> Remote power monitoring for individual nodes

Ethernet addressable power strips can be used for this.

> *Temperature monitoring of individual processors

No, although a ganglia module can be created to do this. The problem is there isn't a common standard out there for *all* hardware right now.

> *Power sequencing on startup to prevent possible power spiking

Page 217: 2003 December

Ethernet addressable power strips can be used for this.

> *Remote power-down and reset of system and nodes

Yes (using sw). For hw control you would need a remote management board in every node, or ethernet addressable power stips.

> *Serial access to nodes

No, Rocks using ssh and ethernet for this. But you can add your own serial port concentrator if you need.

> *Disk cloning

Nope, this doesn't scale in both system and people time. Rocks uses RedHat's Kickstart to build the disk image on each node in a cluster programmatically. This is extremely fast -- in fact a 128 node cluster can be built from scratch (including hardware integration) in under 2 hours, and the entire cluster can be reinstalled in around 12 minutes. We did this as a demonstration of Rock's scalability at SC'03 (we even have a movie of it).

> *Plug-In Extensible Architecture

Yes. You can add to the cluster database and extend our utilities. Everything is open.

> *Image Manager

Rocks does not do system imaging. We have a utility called rocks-dist that builds distributions for you. This combined with the XML profile graph gives you what you want here.

> How should be the disk setup, does all the disks need to be attached to> frontend and compute nodes have small 3 or 4 GB disks?

Buy the smallest modern HD you can for the compute node (4 GB is fine). By default the frontend serves user directories over NFS so you should have more storage on the frontend node.

-mjk

From landman at scalableinformatics.com Tue Dec 16 13:43:51 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 16 Dec 2003 16:43:51 -0500Subject: [Rocks-Discuss]Java Rocks clusterIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Hi Puru:

Java threads are shared memory objects at this moment. You would needto look at thread-migration schemes to layer atop the process, and adistributed shared memory model to handle memory issues. I don't thinkJava natively supports this, so you will likely have to appeal to some

Page 218: 2003 December

other method.

Moreover, shared memory across slower cluster network fabrics ispainful at best. If you are going to work on a single system imagemachine with shared memory, you want the fastest/best fabric you canget.

If it is easier to re-architect your code to be independent workerprocesses, you could write it using JVMs and simple sockets or similar. If it is threaded, you may have problems parallelizing it on a cluster.

Joe

On Tue, 2003-12-16 at 15:39, Purushotham Komaravolu wrote:> I am a newbie to ROCKS> I have a question about running Java on a Rockster.> Is it possible that I can start only one JVM on one machine and the> task be run distributed on the cluster? It is a multi-threaded> application. > Like say, I have an application with 100 threads. Can I have 50> threads run on one machine and 50 on another by launching the> application(jvm) on one machine?(similar to SUN Firebird) I dont want> to use MPI or any special code.> Thanks> Sincerely > Puru-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

From rscarce at caci.com Tue Dec 16 10:56:18 2003From: rscarce at caci.com (Reed Scarce)Date: Tue, 16 Dec 2003 13:56:18 -0500Subject: [Rocks-Discuss]grub / boot / fdisk problemMessage-ID: <OF2C6AD168.EB3D778E-ON85256DFE.0067CF1C-85256DFE.006812B4@caci.com>

I installed Rocks on a primary master hard drive.It became necessary to re-install I took anidentical hd and made it primary master. The first drive, which bootsfine, was left off the system to act as an archive, to mount after thenew system was up and running.The new system was installed and works great, now to correctly installthe old drive as primary slave, reboot, mount and copy the scripts andconfigs to the new system!There the problem began.When I boot either drive as primary master and only primary drive,they boot fine.When I connect either drive correctly configured and recognized by theBIOS, as primary or secondary slave - grub gives a GRUB prompt andwon't boot.Something interesting, when booted from a floppy (mkbootdisk)from thenew disk, in /var/log/dmesg both drives are visible but fdisk reportsthe partition table is empty - so I can't mount the drive from afloppy boot.

Page 219: 2003 December

dmesg is like this: (my comments)hda: ST34321A, ... (pri master)hdb: ST34321A, ... (pri slave)hdc: FX4010M, ATAPI CD/DVD-ROM drive (secnd master)hdd: ST320420A, ... (secnd slave)ide0 at ... (ide pri chain)ide1 at ... (ide secnd chain)hda: 8404830 sectors ... (good)hdb: 8404830 sectors ... (good)hdd: 39851760 sectors ... (good)ide-floppy driver ... (ok)Partition check: (<---<<<this is where it gets interesting)hda:hdb:hdd: hdd1 hdd2 hdd3 (<---<<<that's right, hdd is now the boot drive. Even if I boot without the floppy, hdd is the boot drive.)

Any suggestons?

Reed ScarceSystems EngineerCACI, Inc.1100 N. Glebe RdArlington, VA 22201(703) 841-3045-------------- next part --------------An HTML attachment was scrubbed...URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031216/498124c7/attachment-0001.html

From ShiYi.Yue at astrazeneca.com Tue Dec 16 14:05:46 2003From: ShiYi.Yue at astrazeneca.com (ShiYi.Yue at astrazeneca.com)Date: Tue, 16 Dec 2003 23:05:46 +0100Subject: [Rocks-Discuss]hardware compability check wirh Rocks 3.00Message-ID: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net>

hi,

I was wondering if there is a way to set a hardware compability check in thekickstart of Rocks, and give us an oppotunity to add the drvers once theuncompatible hardware was detected.

I have some PCs with Broadcom Gbit 10/100/1000 network cards, It looks Rocks3.0 was not happy with these network cards. The only way I can do now(without rebuild the distribution) is to replace these cards. I am afraidthis type of situation will happen again and again since RH7.3 is gettingolder and older.I hope I were wrong and someone can point me a solution.Shi-Yishiyi.yue at astrazeneca.com

From mjk at sdsc.edu Tue Dec 16 14:55:38 2003From: mjk at sdsc.edu (Mason J. Katz)

Page 220: 2003 December

Date: Tue, 16 Dec 2003 14:55:38 -0800Subject: [Rocks-Discuss]hardware compability check wirh Rocks 3.00In-Reply-To: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net>References: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net>Message-ID: <[email protected]>

We've been thinking about this off and on for over a year -- it's a pretty hard problem. The real trick to supporting all hardware is keeping the boot kernel current. We've let our releases get old and more and more people are seeing hardware support issues.

Rocks 3.1 (out next week) will include the latest RedHat kernel from RHEL 3.0. This will fix most of the hardware support issues out there. When we release please download 3.1 and try it with you hardware, if this still fails please let us know. Thanks.

-mjk

On Dec 16, 2003, at 2:05 PM, ShiYi.Yue at astrazeneca.com wrote:

> hi,>> I was wondering if there is a way to set a hardware compability check > in the> kickstart of Rocks, and give us an oppotunity to add the drvers once > the> uncompatible hardware was detected.>> I have some PCs with Broadcom Gbit 10/100/1000 network cards, It looks > Rocks> 3.0 was not happy with these network cards. The only way I can do now> (without rebuild the distribution) is to replace these cards. I am > afraid> this type of situation will happen again and again since RH7.3 is > getting> older and older.> I hope I were wrong and someone can point me a solution.> Shi-Yi> shiyi.yue at astrazeneca.com

From msherman at informaticscenter.info Tue Dec 16 16:25:45 2003From: msherman at informaticscenter.info (Mark Sherman)Date: Tue, 16 Dec 2003 17:25:45 -0700Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact ApproMessage-ID: <[email protected]>

Hello, I'm an administrator on a pure i386 cluster under Rocks 3.0.0, and our clients are pushing us to include some Opteron nodes. I'm trying to find out the feasibility of such an addition. I know there's been a lot of talk about Opterons on the rocks list, so I'm wondering if someone can give a boiled-down can-do can't-do maybe-but-we-haven't-tested-it-yet kind of status. With that, I'd say I'm probaly willing to be a pseudo-beta site and give feedback on how the system works.Thank you very much, and keep up the good work. I love the Rocks system.~M

Page 221: 2003 December

______________________________________________Mark ShermanComputing Systems AdministratorInformatics CenterMassachusetts Biomedical InitiativesWorcester MA 01605508-797-4200msherman at informaticscenter.info----------------------~-----------------------

> -------- Original Message --------> Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact Appro> From: "Jian Chang" <jian at appro.com>> Date: Fri, December 12, 2003 6:27 pm> To: "Bryan Littlefield" <bryan at UCLAlumni.net>,> npaci-rocks-discussion at sdsc.edu, mjk at sdsc.edu> > Hello Mason / Puru,> > I got your contact information from Bryan Littlefield.> I would like to discuss with you regarding benchmark test systems you> might need down the road.> We can also share with you our findings as to what is compatible in the> Opteron systems.> Please reply with your phone number where I can reach you, and I will> call promptly.> > Bryan,> > Thank you for the referral.> > Best regards,> > Jian Chang> Regional Sales Manager> (408) 941-8100 x 202> (800) 927-5464 x 202> (408) 941-8111 Fax> jian at appro.com> www.appro.com> > -----Original Message-----> From: Bryan Littlefield [mailto:bryan at UCLAlumni.net]> Sent: Tuesday, December 09, 2003 12:14 PM> To: npaci-rocks-discussion at sdsc.edu; mjk at sdsc.edu> Cc: Jian Chang> Subject: Rocks-Discuss] AMD Opteron - Contact Appro> > Hi Mason,> > I suggest contacting Appro. We are using Rocks on our Opteron cluster> and Appro would likely love to help. I will contact them as well to see> if they could help getting a opteron machine for testing. Contact info> below:> > Thanks --Bryan> > Jian Chang - Regional Sales Manager

Page 222: 2003 December

> (408) 941-8100 x 202> (800) 927-5464 x 202> (408) 941-8111 Fax> jian at appro.com > http://www.appro.com> > npaci-rocks-discussion-request at sdsc.edu wrote:> > > From: "Mason J. Katz" <mailto:mjk at sdsc.edu> <mjk at sdsc.edu>> Subject: Re: [Rocks-Discuss]AMD Opteron> Date: Tue, 9 Dec 2003 07:28:51 -0800> To: "purushotham komaravolu" <mailto:purikk at hotmail.com>> <purikk at hotmail.com>> > We have a beta right now that we have sent to a few people. We plan on> > a release this month, and AMD_64 will be part of this release along > with the usual x86, IA64 support.> > If you want to help accelerate this process please talk to your vendor> > about loaning/giving us some hardware for testing. Having access to a> > variety of Opteron hardware (we own two boxes) is the only way we can > have good support for this chip.> > -mjk> > > On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:> > > Cc: <mailto:npaci-rocks-discussion at sdsc.edu>> <npaci-rocks-discussion at sdsc.edu>> > > Hello,> I am a newbie to ROCKS cluster. I wanted to setup clusters> > on> 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel > and> AMD).> I found the 64-bit download for Intel on the website but not for AMD. > Does> it work for AMD opteron? if not what is the ETA for AMD-64.> We are planning to but AMD-64 bit machines shortly, and I would like> to> volunteer for the beta testing if needed.> Thanks> Regards,> Puru> > > _______________________________________________> npaci-rocks-discussion mailing list> npaci-rocks-discussion at sdsc.edu> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

Page 223: 2003 December

> > > End of npaci-rocks-discussion Digest

From fds at sdsc.edu Tue Dec 16 18:04:47 2003From: fds at sdsc.edu (Federico Sacerdoti)Date: Tue, 16 Dec 2003 18:04:47 -0800Subject: [Rocks-Discuss]a minor ganglia questionIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Dan,

Good question. Unfortunately this behavior is hardwired into stock Ganglia, not the Rocks-specific pages that we have more control over.

The good news is that I wrote the code for this page :) Its easy to fix if you would like to do it yourself.

Edit the file /var/www/html/ganglia/functions.php. On line 386, you should see:

krsort($racks[$rack]);

To get the ordering you desire, change this to:

ksort($racks[$rack]);

Thats it. You should see the high-numbered compute nodes at the bottom of the rack. I will see if we can get a config file button on the page to give this option for a later release of Ganglia.

-Federico

On Dec 16, 2003, at 11:45 AM, Dan Wright wrote:

> Hello all,>> I'm in the process of setting up a 3.0.0 cluster and have a question > about the> "Physical view" in ganglia. In this view (which is quite cool BTW :) > is shows> higher-numbered nodes on top and lower-numbered nodes on bottom:>> compute-0-12> ...> compute-0-2> compute-0-1> compute-0-0>> and my cluster is physically reversed from that:>> compute-0-0> compute-0-1> compute-0-2> ...> compute-0-12

Page 224: 2003 December

>> Is there an easy way to switch this display around so it matches the > real> physical layout? I poked around and ganglia for a few minutes and > didn't see> anything obvious, so I thought I'd ask before I actually start wasting > time on> this :)>> Thanks,>> - Dan Wright> (dtwright at uiuc.edu)> (http://www.scs.uiuc.edu/)> (UNIX Systems Administrator, School of Chemical Sciences, UIUC)> (333-1728)>Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

From csamuel at vpac.org Tue Dec 16 18:49:22 2003From: csamuel at vpac.org (Chris Samuel)Date: Wed, 17 Dec 2003 13:49:22 +1100Subject: [Rocks-Discuss]a minor ganglia questionIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

-----BEGIN PGP SIGNED MESSAGE-----Hash: SHA1

On Wed, 17 Dec 2003 06:45 am, Dan Wright wrote:

> Is there an easy way to switch this display around so it matches the real> physical layout?

I think this is why they tell you to install the compute nodes from the bottom of the rack. :-)

cheers,Chris- -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/38QyO2KABBYQAh8RAo+vAJ0XcP6tBJpwjxYnicEQkysRslWmmQCcDpebK8bNCLgiF5umMiJ/59ICN70==57YJ-----END PGP SIGNATURE-----

Page 225: 2003 December

From hermanns at tupi.dmt.upm.es Wed Dec 17 00:08:19 2003From: hermanns at tupi.dmt.upm.es (Miguel Hermanns)Date: Wed, 17 Dec 2003 09:08:19 +0100Subject: [Rocks-Discuss]Creation of a hardware compatibility list?Message-ID: <[email protected]>

Since one of the strong features of Rocks is the posibility of fast deployment of clusters, wouldn't it be of interest to create a hardware compatibility list on the web page of Rocks? This list could be filled in by the users of Rocks with their experience and the hardware they have. In this way somebody interested in building a cluster as fast as possible could check the list and buy something absolutely 100% compatible with Rocks.

I know that in principle one could check the compatibility list of RH, but my own experience was negative in that aspect (I installed an Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was unable to recognize it).

Miguel

From mjk at sdsc.edu Wed Dec 17 09:03:00 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Wed, 17 Dec 2003 09:03:00 -0800Subject: [Rocks-Discuss]Creation of a hardware compatibility list?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

We have thought about this, and have some ideas on how to setup a useful page. Something like the old Linux laptop hardware list but simpler to mine for data. It's been on our long list of things to do for a while now :)

-mjk

On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote:

> Since one of the strong features of Rocks is the posibility of fast > deployment of clusters, wouldn't it be of interest to create a > hardware compatibility list on the web page of Rocks? This list could > be filled in by the users of Rocks with their experience and the > hardware they have. In this way somebody interested in building a > cluster as fast as possible could check the list and buy something > absolutely 100% compatible with Rocks.>> I know that in principle one could check the compatibility list of RH, > but my own experience was negative in that aspect (I installed an > Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was > unable to recognize it).>> Miguel>

Page 226: 2003 December

From junkscarce at hotmail.com Wed Dec 17 09:31:21 2003From: junkscarce at hotmail.com (Reed Scarce)Date: Wed, 17 Dec 2003 17:31:21 +0000Subject: [Rocks-Discuss]fidsk reports all zeros, need actualMessage-ID: <[email protected]>

Good ol' fdisk "print" on my compute node give me a line:Device Boot Start End Blocks Id System

but no data.

Extra Functionality's "print" reportsNr AF Hd Sec Cyl Hd Sec Cyl Start Size ID1 00 0 0 0 0 0 0 0 0 02 00 0 0 0 0 0 0 0 0 03 00 0 0 0 0 0 0 0 0 04 00 0 0 0 0 0 0 0 0 0

How can I retrieve the information necessary for scripted information at node installation time?

TIA--RRS

_________________________________________________________________Enjoy the holiday season with great tips from MSN. http://special.msn.com/network/happyholidays.armx

From dtwright at uiuc.edu Wed Dec 17 11:49:53 2003From: dtwright at uiuc.edu (Dan Wright)Date: Wed, 17 Dec 2003 13:49:53 -0600Subject: [Rocks-Discuss]a minor ganglia questionIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Eh...whatever ;-) I started using rocks with 2.2.1 (when there was nophysical layout display) and haven't read the manual again since :)

Chris Samuel said:> -----BEGIN PGP SIGNED MESSAGE-----> Hash: SHA1> > On Wed, 17 Dec 2003 06:45 am, Dan Wright wrote:> > > Is there an easy way to switch this display around so it matches the real> > physical layout?> > I think this is why they tell you to install the compute nodes from the bottom > of the rack. :-)> > cheers,> Chris> - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin> Victorian Partnership for Advanced Computing http://www.vpac.org/> Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

Page 227: 2003 December

> > -----BEGIN PGP SIGNATURE-----> Version: GnuPG v1.2.2 (GNU/Linux)> > iD8DBQE/38QyO2KABBYQAh8RAo+vAJ0XcP6tBJpwjxYnicEQkysRslWmmQCcDpeb> K8bNCLgiF5umMiJ/59ICN70=> =57YJ> -----END PGP SIGNATURE-----> - Dan Wright(dtwright at uiuc.edu)(http://www.uiuc.edu/~dtwright)

-] ------------------------------ [-] -------------------------------- [-``Weave a circle round him thrice, / And close your eyes with holy dread, For he on honeydew hath fed, / and drunk the milk of Paradise.'' Samuel Taylor Coleridge, Kubla Khan-------------- next part --------------A non-text attachment was scrubbed...Name: not availableType: application/pgp-signatureSize: 189 bytesDesc: not availableUrl : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031217/a3718aef/attachment-0001.bin

From dtwright at uiuc.edu Wed Dec 17 11:51:00 2003From: dtwright at uiuc.edu (Dan Wright)Date: Wed, 17 Dec 2003 13:51:00 -0600Subject: [Rocks-Discuss]a minor ganglia questionIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Federico,

Thanks! That'll make this easy enough... maybe next time I'll read themanual and install the machines in the rocks-recommended order as anotherposter suggested :)

Federico Sacerdoti said:> Dan,> > Good question. Unfortunately this behavior is hardwired into stock > Ganglia, not the Rocks-specific pages that we have more control over.> > The good news is that I wrote the code for this page :) Its easy to fix > if you would like to do it yourself.> > Edit the file /var/www/html/ganglia/functions.php. On line 386, you > should see:> > krsort($racks[$rack]);> > To get the ordering you desire, change this to:> > ksort($racks[$rack]);>

Page 228: 2003 December

> Thats it. You should see the high-numbered compute nodes at the bottom > of the rack. I will see if we can get a config file button on the page > to give this option for a later release of Ganglia.> > -Federico> > On Dec 16, 2003, at 11:45 AM, Dan Wright wrote:> > >Hello all,> >> >I'm in the process of setting up a 3.0.0 cluster and have a question > >about the> >"Physical view" in ganglia. In this view (which is quite cool BTW :) > >is shows> >higher-numbered nodes on top and lower-numbered nodes on bottom:> >> >compute-0-12> >...> >compute-0-2> >compute-0-1> >compute-0-0> >> >and my cluster is physically reversed from that:> >> >compute-0-0> >compute-0-1> >compute-0-2> >...> >compute-0-12> >> >Is there an easy way to switch this display around so it matches the > >real> >physical layout? I poked around and ganglia for a few minutes and > >didn't see> >anything obvious, so I thought I'd ask before I actually start wasting > >time on> >this :)> >> >Thanks,> >> >- Dan Wright> >(dtwright at uiuc.edu)> >(http://www.scs.uiuc.edu/)> >(UNIX Systems Administrator, School of Chemical Sciences, UIUC)> >(333-1728)> >> Federico> > Rocks Cluster Group, San Diego Supercomputing Center, CA> - Dan Wright(dtwright at uiuc.edu)(http://www.uiuc.edu/~dtwright)

-] ------------------------------ [-] -------------------------------- [-``Weave a circle round him thrice, / And close your eyes with holy dread, For he on honeydew hath fed, / and drunk the milk of Paradise.'' Samuel Taylor Coleridge, Kubla Khan-------------- next part --------------

Page 229: 2003 December

A non-text attachment was scrubbed...Name: not availableType: application/pgp-signatureSize: 189 bytesDesc: not availableUrl : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031217/620937b3/attachment-0001.bin

From bruno at rocksclusters.org Wed Dec 17 12:52:30 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 17 Dec 2003 12:52:30 -0800Subject: [Rocks-Discuss]fidsk reports all zeros, need actualIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> Good ol' fdisk "print" on my compute node give me a line:> Device Boot Start End Blocks Id System>> but no data.>> Extra Functionality's "print" reports> Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID> 1 00 0 0 0 0 0 0 0 0 0> 2 00 0 0 0 0 0 0 0 0 0> 3 00 0 0 0 0 0 0 0 0 0> 4 00 0 0 0 0 0 0 0 0 0>> How can I retrieve the information necessary for scripted information > at node installation time?

this should answer your question:

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-February/ 001388.html

- gb

From anand at novaglobal.com.sg Wed Dec 17 20:14:45 2003From: anand at novaglobal.com.sg (Anand Vaidya)Date: Wed, 17 Dec 2003 23:14:45 -0500Subject: [Rocks-Discuss]Creation of a hardware compatibility list?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Why not create a Wiki? Wiki is easy enough to install (60seconds?) and just the right tool for user-driven projects like Rocks.

Nice example of wiki wiki webs are http://en.wikipedia.org/ or even my favourite GentooServer project has a very nice wiki at http://www.subverted.net/wakka/wakka.php?wakka=MainPage (Though not related to clustering)

Regards,Anand

Page 230: 2003 December

On Wednesday 17 December 2003 12:03, Mason J. Katz wrote:> We have thought about this, and have some ideas on how to setup a> useful page. Something like the old Linux laptop hardware list but> simpler to mine for data. It's been on our long list of things to do> for a while now :)>> -mjk>> On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote:> > Since one of the strong features of Rocks is the posibility of fast> > deployment of clusters, wouldn't it be of interest to create a> > hardware compatibility list on the web page of Rocks? This list could> > be filled in by the users of Rocks with their experience and the> > hardware they have. In this way somebody interested in building a> > cluster as fast as possible could check the list and buy something> > absolutely 100% compatible with Rocks.> >> > I know that in principle one could check the compatibility list of RH,> > but my own experience was negative in that aspect (I installed an> > Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was> > unable to recognize it).> >> > Miguel

-

From mjk at sdsc.edu Thu Dec 18 08:02:14 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Thu, 18 Dec 2003 08:02:14 -0800Subject: [Rocks-Discuss]Creation of a hardware compatibility list?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

I've been thinking about a rocks wiki for a few months now, but I'm a bit paranoid about the lack of authentication for updates (basically anyone can modify your site).

If there is interest out there, we could just set one up, leave it alone, and let our users worry about the content. Done well this could have information on:

- hardware issues- bugs reports- feature requests- contributed documentation (to be moved into our users manual)- etc

Basically a simple version of sourceforge (we have no plans to move to sourceforge -- the interface and bandwidth both stink). Ideas....?

-mjk

On Dec 17, 2003, at 8:14 PM, Anand Vaidya wrote:

Page 231: 2003 December

> Why not create a Wiki? Wiki is easy enough to install (60seconds?) and > just> the right tool for user-driven projects like Rocks.>> Nice example of wiki wiki webs are http://en.wikipedia.org/ or even my> favourite GentooServer project has a very nice wiki at http://> www.subverted.net/wakka/wakka.php?wakka=MainPage (Though not related to> clustering)>> Regards,> Anand>> On Wednesday 17 December 2003 12:03, Mason J. Katz wrote:>> We have thought about this, and have some ideas on how to setup a>> useful page. Something like the old Linux laptop hardware list but>> simpler to mine for data. It's been on our long list of things to do>> for a while now :)>>>> -mjk>>>> On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote:>>> Since one of the strong features of Rocks is the posibility of fast>>> deployment of clusters, wouldn't it be of interest to create a>>> hardware compatibility list on the web page of Rocks? This list could>>> be filled in by the users of Rocks with their experience and the>>> hardware they have. In this way somebody interested in building a>>> cluster as fast as possible could check the list and buy something>>> absolutely 100% compatible with Rocks.>>>>>> I know that in principle one could check the compatibility list of >>> RH,>>> but my own experience was negative in that aspect (I installed an>>> Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was>>> unable to recognize it).>>>>>> Miguel>> -

From hermanns at tupi.dmt.upm.es Fri Dec 19 00:47:11 2003From: hermanns at tupi.dmt.upm.es (Miguel Hermanns)Date: Fri, 19 Dec 2003 09:47:11 +0100Subject: [Rocks-Discuss]Creation of a hardware compatibility list?Message-ID: <[email protected]>

//>>I've been thinking about a rocks wiki for a few months now, but I'm abit paranoid about the lack of authentication for updates (basicallyanyone can modify your site).<<

One possible filter could be that only the users of the registered clusters can modify the wiki (So that when you summit the data of the cluster you also include a user and a password), although in that case I would be excluded, since our cluster has been unable to work with Rocks yet :-(.

>> - hardware issues

Page 232: 2003 December

>> - bugs reports >> - feature requests >> - contributed documentation (to be moved into our users manual) >> - etc

So for example the cluster register could be editable by the registered users (each one only its entry) and could include a description of the installed hardware (not just the processor, but also the motherboard model, the hard disks, NICs, etc). So everybody interested in building a cluster could go to the register, have a look and click on the different clusters that are similar to the one in mind. After that with just a click the user could review the hardware configuration and the encountered problems.

This would also be greate if the Rocks clusters get updated, because then their builders could go and update their entry without needing to summit an email to the Rocks team, hence avoinding giving them extra work.

In order to include the not yet working Rocks clusters, the database of clusters (with the corresponding users and passwords) could be extended by them, but their entries would not be shown on the Rocks register until they are fully working. In this way information on the hardware incompatibilities can be collected and could be shown on a different part of www.rocksclusters.org.

The feature requests would still be handled through the maillist and for the contributed documentation I would place the sourcefiles in readonly mode on the ftp server and if somebody goes and makes modifications on them, then the new version should be emailed to the persons in charge of the docs to give their approval.

Miguel

From jkreuzig at uci.edu Fri Dec 19 16:58:58 2003From: jkreuzig at uci.edu (James Kreuziger)Date: Fri, 19 Dec 2003 16:58:58 -0800 (PST)Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Ok, I need some help here. I've managed to setupmy frontend node, and it is up and running. I havemy 8 nodes all connected up to a Dell Power Connect 5224.I can access the switch through a serial terminal andget a command line interface. The little lights on thefront of the switch are blinking, so that's good.

However, I can't get the switch recognized by insert-ethers.I've even managed to change the IP of the switch throughthe CLI, but I can't see the switch from the frontend node.I can't telnet, get the web interface or anything. I haven'tsaved the configuration, so a reboot of the switch willreset the values.

I'm grasping at straws here. I'm not a network engineer,so I could use some help getting this thing configured.

Page 233: 2003 December

If anybody can help me out, contact me by email.

Thanks,

-Jim

*************************************************Jim Kreuzigerjkreuzig at uci.edu949-824-4474*************************************************

From tim.carlson at pnl.gov Fri Dec 19 17:24:22 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Fri, 19 Dec 2003 17:24:22 -0800 (PST)Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Fri, 19 Dec 2003, James Kreuziger wrote:

I think we need a Rocks FAQ

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-August/002762.html

You need to turn on fast-link.

> Ok, I need some help here. I've managed to setup> my frontend node, and it is up and running. I have> my 8 nodes all connected up to a Dell Power Connect 5224.> I can access the switch through a serial terminal and> get a command line interface. The little lights on the> front of the switch are blinking, so that's good.>> However, I can't get the switch recognized by insert-ethers.> I've even managed to change the IP of the switch through> the CLI, but I can't see the switch from the frontend node.> I can't telnet, get the web interface or anything. I haven't> saved the configuration, so a reboot of the switch will> reset the values.>> I'm grasping at straws here. I'm not a network engineer,> so I could use some help getting this thing configured.>> If anybody can help me out, contact me by email.>> Thanks,>> -Jim>> *************************************************> Jim Kreuziger> jkreuzig at uci.edu> 949-824-4474> *************************************************

Page 234: 2003 December

>>>

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From Georgi.Kostov at umich.edu Fri Dec 19 17:34:15 2003From: Georgi.Kostov at umich.edu (Georgi Kostov)Date: Fri, 19 Dec 2003 20:34:15 -0500Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Jim,

I have a 5224 here. What are your config settings on the switch? I.e. IP,sub-net mask, gateway settings - for both the switch and the interface of thehead-node on which the 5224 is connected (I assume it's on the private subnet,so the subnet is something like 10.0.0.0/255.0.0.0 with the frontend internalinterface (eth0) as 10.0.1.1, right?)

One thing to try on the head node is use (as root) "tcpdump eth0", and watch forpackets. To avoid clutter, I would either turn the rest (compute nodes, etc.)off, or filter them out with settings on tcpdump.

With some more info we should be able to tease this out.

--Georgi

Michigan Center for Biological Information (MCBI)University of Michigan3600 Green Court, Suite 700Ann Arbor, MI 48105-1570Phone/Fax: (734) 998-9236/8571kostov at umich.eduwww.ctaalliance.org

Quoting James Kreuziger <jkreuzig at uci.edu>:

> Ok, I need some help here. I've managed to setup> my frontend node, and it is up and running. I have> my 8 nodes all connected up to a Dell Power Connect 5224.> I can access the switch through a serial terminal and> get a command line interface. The little lights on the> front of the switch are blinking, so that's good.> > However, I can't get the switch recognized by insert-ethers.> I've even managed to change the IP of the switch through> the CLI, but I can't see the switch from the frontend node.> I can't telnet, get the web interface or anything. I haven't

Page 235: 2003 December

> saved the configuration, so a reboot of the switch will> reset the values.> > I'm grasping at straws here. I'm not a network engineer,> so I could use some help getting this thing configured.> > If anybody can help me out, contact me by email.> > Thanks,> > -Jim> > *************************************************> Jim Kreuziger> jkreuzig at uci.edu> 949-824-4474> *************************************************> > >

From daniel.kidger at quadrics.com Mon Dec 22 01:45:47 2003From: daniel.kidger at quadrics.com (Dan Kidger)Date: Mon, 22 Dec 2003 09:45:47 +0000Subject: Fwd: Re: [Rocks-Discuss]Dell Power Connect 5224Message-ID: <[email protected]>

---------- Forwarded Message ----------

Subject: Re: [Rocks-Discuss]Dell Power Connect 5224Date: Mon, 22 Dec 2003 09:38:41 +0000From: Dan Kidger <daniel.kidger at quadrics.com>To: Georgi Kostov <Georgi.Kostov at umich.edu>Cc: paci-rocks-discussion at sdsc.edu

> Quoting James Kreuziger <jkreuzig at uci.edu>:> > Ok, I need some help here. I've managed to setup> > my frontend node, and it is up and running. I have> > my 8 nodes all connected up to a Dell Power Connect 5224.> > I can access the switch through a serial terminal and> > get a command line interface. The little lights on the> > front of the switch are blinking, so that's good.> >> > However, I can't get the switch recognized by insert-ethers.> > I've even managed to change the IP of the switch through> > the CLI, but I can't see the switch from the frontend node.> > I can't telnet, get the web interface or anything. I haven't> > saved the configuration, so a reboot of the switch will> > reset the values.

I don't know much about the 5224 per se. but I do know that much of the timeemebedded devices *have* to be rebooted to pick up new settings for their IP.

once done - I would try pinging the switch's IP abnd then doing 'arp -a' to see its MAC address (which should match that on the white sticky label on the back)

Page 236: 2003 December

Daniel.

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505----------------------- www.quadrics.com --------------------

-------------------------------------------------------

-- Yours,Daniel.

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505----------------------- www.quadrics.com --------------------

From daniel.kidger at quadrics.com Mon Dec 22 09:03:56 2003From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)Date: Mon, 22 Dec 2003 17:03:56 -0000Subject: [Rocks-Discuss]RE:Writing a Roll ?Message-ID: <[email protected]>

Folks, I have made good headway in adding software and its configuration using extend-compute.xml and now have a robust system. (the head node install is still rather manual though :-( )

I would now like to move to doing this as a Roll. However I am not sureof the best way of proceeding - there appears to be little documentation - either on HOWTO or on the underlying concepts.

I have mounted the HPC_roll.iso and browsed around: - the image seems to consists of 2 subdirectories - in the same style as RedHat CD's - as expected ./SRPMS contains the source RPMs, and ./RedHat/RPMS contains binary RPMs ( the latter contains many more RPMs than there is an SRPM for. )

There is no obvious configuration information until you explore: roll-hpc-kickstart-3.0.0-0.noarch.rpmThis seems to contain lots of XML which at first glance is hard to decifer.

So my question is: Should we be writing our own rolls, and if so how ? (examples?)

Yours,Daniel.

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505

Page 237: 2003 December

----------------------- www.quadrics.com --------------------

>

From daniel.kidger at quadrics.com Mon Dec 22 09:08:21 2003From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)Date: Mon, 22 Dec 2003 17:08:21 -0000Subject: [Rocks-Discuss]shucks.Message-ID: <[email protected]>

# rpm -ql roll-hpc-kickstart |xargs -l grep -inH sucks

/export/home/install/profiles/current/nodes/force-smp.xml:21: IBM sucks/export/home/install/profiles/current/nodes/ganglia-server.xml:134: perl sucks/export/home/install/profiles/current/nodes/ganglia-server.xml:148: Switch from ISC to RedHat's pump. Pump sucks but it is standard so/export/home/install/profiles/current/nodes/sendmail-masq.xml:31: m4 sucks

:-)

Have a good Christmas,Daniel.

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505----------------------- www.quadrics.com --------------------

From fds at sdsc.edu Mon Dec 22 10:22:54 2003From: fds at sdsc.edu (Federico Sacerdoti)Date: Mon, 22 Dec 2003 10:22:54 -0800Subject: [Rocks-Discuss]RE:Writing a Roll ?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

You are right, we have little documentation on creating new rolls. I have lamented to Greg about this, and he has done the same to me. Basically we have been so busy trying to get the 3.1.0 release out that we haven't put our nose to the grindstone about the Developer docs.

Here is a little primer since it sounds like you are indeed ready.

1. The first thing to realize is that rolls are not build from "scratch", but are done from the safe confines of our build environment. This environment is the directory:

[your local rocks CVS sandbox]/src/roll/

You must checkout the Rocks CVS tree to get this. Instructions about how to do this (anonymously) are at http://cvs/rocksclusters.org/.

Once you have this build environment on your frontend system, you are ready for the next step to building your roll. You should make a new directory here called "quadrics" - the name matters as it will be the identifier for your roll from now on.

Page 238: 2003 December

1. Now the best thing I can tell you is to look at the "hpc" and "sge" roll (two of our most mature) for the directory structure in "quadrics". Its fairly straightforward, and mirrors what we do for the base. The "nodes" directory will hold your "extend-compute.xml", etc. (more on this later). The "roll-quadrics-kickstart.noarch.rpm" is made automatically for your from information in these directories.

2. The "src" dir holds anything you need to compile. Anything in src should deposit an RPM package in the "RPMS" directory when its build is finished.

3. You type "make roll" to start the build process. It will take a bit of study for you to get things correct, but suffice it to say that you will have an iso file suitable for burning when you are done. Thank bruno for this sweet fact - everything is automatic except your intellectual property :)

One more word on your XML files. Our philosophy of rolls is not to use the "extend/replace" strategy that we advocate for customization. As a roll builder, you are at the grass-roots level, and can rise above simple customization techniques.

Your roll should define a "quadrics.xml" node in the kickstart graph. You define the node in the file "roll/quadrics/nodes/quadrics.xml" and the edges in the file "roll/quadrics/graphs/default/quadrics.xml". Look at the SGE roll for a good example of this. By defining your configuration this way, you have more power to do complex tasks (different configuration for different appliance types), and to leave room for future growth.

Good luck, and we hope and pray for a good technical writer that will do this process justice.

-Federico

On Dec 22, 2003, at 9:03 AM, daniel.kidger at quadrics.com wrote:

> Folks,> I have made good headway in adding software and its configuration > using extend-compute.xml and now have a robust system. (the head node > install is still rather manual though :-( )>> I would now like to move to doing this as a Roll. However I am not > sureof the best way of proceeding - there appears to be little > documentation - either on HOWTO or on the underlying concepts.>> I have mounted the HPC_roll.iso and browsed around:> - the image seems to consists of 2 subdirectories - in the same style > as RedHat CD's> - as expected ./SRPMS contains the source RPMs, and ./RedHat/RPMS > contains binary RPMs> ( the latter contains many more RPMs than there is an SRPM for. )>> There is no obvious configuration information until you explore:> roll-hpc-kickstart-3.0.0-0.noarch.rpm> This seems to contain lots of XML which at first glance is hard to > decifer.>

Page 239: 2003 December

> So my question is:> Should we be writing our own rolls, and if so how ? (examples?)>>> Yours,> Daniel.>> --------------------------------------------------------------> Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com> One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505> ----------------------- www.quadrics.com -------------------->>>>>Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

From mjk at sdsc.edu Mon Dec 22 11:07:32 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Mon, 22 Dec 2003 11:07:32 -0800Subject: [Rocks-Discuss]shucks.In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

If these are the worst CVS log comments you've found you aren't looking very hard. The only one here I'm compelled to clarify is IBM. There are around 3-5 ways of probing the chipset to determine if the box is SMP, RedHat supports the most common ones which everyone in the world except IBM use. This forced us to patch anaconda to detect SMP for IBM hardware (or in this case just force it) -- didn't these guys invent the PC?

-mjk

On Dec 22, 2003, at 9:08 AM, daniel.kidger at quadrics.com wrote:

>> # rpm -ql roll-hpc-kickstart |xargs -l grep -inH sucks>> /export/home/install/profiles/current/nodes/force-smp.xml:21: IBM > sucks> /export/home/install/profiles/current/nodes/ganglia-server.xml:134: > perl sucks> /export/home/install/profiles/current/nodes/ganglia-server.xml:148: > Switch from ISC to RedHat's pump. Pump sucks but it is standard so> /export/home/install/profiles/current/nodes/sendmail-masq.xml:31: m4 > sucks>> :-)>> Have a good Christmas,> Daniel.>> --------------------------------------------------------------> Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com

Page 240: 2003 December

> One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505> ----------------------- www.quadrics.com --------------------

From mjk at sdsc.edu Mon Dec 22 11:13:30 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Mon, 22 Dec 2003 11:13:30 -0800Subject: [Rocks-Discuss]RE:Writing a Roll ?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

http://cvs.rocksclusters.org

In the rocks/src/roll directory you can see several roll examples, all of which are build be typing "make roll". THe roll-*-kickstart.*.noarch.rpm is the real magic that includes the XML profiles that are grafted onto the base kickstart graph.

-mjk

On Dec 22, 2003, at 9:03 AM, daniel.kidger at quadrics.com wrote:

> Folks,> I have made good headway in adding software and its configuration > using extend-compute.xml and now have a robust system. (the head node > install is still rather manual though :-( )>> I would now like to move to doing this as a Roll. However I am not > sureof the best way of proceeding - there appears to be little > documentation - either on HOWTO or on the underlying concepts.>> I have mounted the HPC_roll.iso and browsed around:> - the image seems to consists of 2 subdirectories - in the same style > as RedHat CD's> - as expected ./SRPMS contains the source RPMs, and ./RedHat/RPMS > contains binary RPMs> ( the latter contains many more RPMs than there is an SRPM for. )>> There is no obvious configuration information until you explore:> roll-hpc-kickstart-3.0.0-0.noarch.rpm> This seems to contain lots of XML which at first glance is hard to > decifer.>> So my question is:> Should we be writing our own rolls, and if so how ? (examples?)>>> Yours,> Daniel.>> --------------------------------------------------------------> Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com> One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505> ----------------------- www.quadrics.com -------------------->>>

Page 241: 2003 December

From daniel.kidger at quadrics.com Mon Dec 22 11:12:17 2003From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)Date: Mon, 22 Dec 2003 19:12:17 -0000Subject: [Rocks-Discuss]RE:Writing a Roll ?Message-ID: <[email protected]>

Federico,

> Here is a little primer since it sounds like you are indeed ready.> --- many very informative lines deleted ---

thanks for that long reply. :-)I am currently pulling a copy of the source tree from cvs.rocksclusters.org (194MB of rocks/doc alone !)

Just a couple of questions for now: 1. Do rolls have to be CD based ? (during development I would probably get through a lot of CDROMs - but more importantly it would get a bit fiddly - to be keep walking round to the CD-writer - then nipping of to the room with the cluster in every time)

2. Do I have to reinstall the headnode from scratch each time I want to test a roll ?(even if the roll only affects RPMs that get installed on compute nodes)

3. Can a CD contain multiple rolls? (Once mature - a cluster may have quite a few rolls: pbs, sge, gm, IB, etc. and Quadrics would proably have two - the (open-source) hardware drivers,MPI,etc and also RMS - our (closed-source) cluster Resource Manager.)

4. What subset of the cvs tree does a Roll developer need? The whole tree is clearly rather excessive.

5. I am a little concerned about the amount of bloat needed to install our five RPMs as a Roll.(The RPMs are already prebuilt by our own internal build proceedures).So taking another case - lets say the Intel Compilers - These have 4 RPMs (plus a little sed-ery of their config files and pasting in the license file). Would these be best installed as a Roll or as a simple extend-compute.xml as I have currently?

Yours,Daniel.

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505----------------------- www.quadrics.com --------------------

From sjenks at uci.edu Mon Dec 22 11:17:07 2003From: sjenks at uci.edu (Stephen Jenks)Date: Mon, 22 Dec 2003 11:17:07 -0800Subject: [Rocks-Discuss]rocks-dist suggestionMessage-ID: <[email protected]>

Page 242: 2003 December

Hi ROCKS folks,

Just a suggestion for when you guys are bored after the 3.1 release 8-)

I ran into some trouble installing some updates to a ROCKS 3.0 cluster that could easily be solved with some checking in rocks-dist:

I put the openssh and other updates in the proper contrib directory under /home/install and ran "rock-dist dist" which properly updated the distribution.

The problem occurred when I tried to reload the computed nodes - the install failed when it hit any of the RPMs in the contrib directory. It turns out the protections on those RPMs was set to 600 because I had copied them out of root's home directory, thus they couldn't be read by the server to send them down to the compute nodes. After fixing the permissions, all was well.

So rocks-dist should check (and possibly fix) permissions on files that will be included in the kickstart distribution. I realize that the mistake was entirely mine, but I'm probably not the only one to ever forget to set permissions correctly and the tool could easily catch such mistakes.

Thanks for putting together such a useful cluster distribution!

Steve Jenks

From msherman at informaticscenter.info Mon Dec 22 11:50:03 2003From: msherman at informaticscenter.info (Mark Sherman)Date: Mon, 22 Dec 2003 12:50:03 -0700Subject: [Rocks-Discuss]MPI and memory + node rescueMessage-ID: <[email protected]>

just for future consideration...any time I need to look at a system without booting it or it's ability to boot I just throw in the knoppix cd. www.knoppix.org______________________________________________Mark ShermanComputing Systems AdministratorInformatics CenterMassachusetts Biomedical InitiativesWorcester MA 01605508-797-4200msherman at informaticscenter.info----------------------~-----------------------

> -------- Original Message --------> Subject: Re: [Rocks-Discuss]MPI and memory + node rescue> From: "Trond SAUE" <saue at quantix.u-strasbg.fr>> Date: Thu, November 27, 2003 1:38 am> To: "Stephen P. Lebeau" <lebeau at openbiosystems.com>> Cc: npaci-rocks-discussion at sdsc.edu> > On 2003.11.26 16:52, Stephen P. Lebeau wrote:

Page 243: 2003 December

> > If you go here, they talk about creating a Linux floppy> > repair disk. Make sure to read the README file... they> > require that you make a 1.68MB floppy ( README explains how )> > > > http://www.tux.org/pub/people/kent-robotti/looplinux/rip/> > > > If that doesn't work...> > > > http://www.toms.net/rb/download.html> > > > I've actually used this one before.> > > > -S> > > In order to have a look at the disk of my crashed node, I downloaded > RIP-2.2-1680.bin from the first site, but I was not able to boot > properly. However, tomsrtbt-2.0.103 from the second site worked very > well and allowed me to reboot the node as well as mount its disk to > look at messages. Unfortunately, they did not really tell me anything > more...However, it might be an idea for a future release of ROCKS to > include a second "standalone" boot option for the computer nodes, so > that one can access them independent of the frontend....> All the best,> Trond Saue> -- > Trond SAUE (DIRAC: > http://dirac.chem.sdu.dk/)> Laboratoire de Chimie Quantique et Mod?lisation Mol?culaire> Universite Louis Pasteur ; 4, rue Blaise Pascal ; F-67000 STRASBOURG> t?l: 03 90 24 13 01 fax: 03 90 24 15 89 email: saue at quantix.u-> strasbg.fr

From daniel.kidger at quadrics.com Mon Dec 22 11:51:16 2003From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)Date: Mon, 22 Dec 2003 19:51:16 -0000Subject: [Rocks-Discuss]rocks-dist suggestionMessage-ID: <[email protected]>

> Just a suggestion for when you guys are bored after the 3.1 > release 8-)

> The problem occurred when I tried to reload the computed nodes - the > install failed when it hit any of the RPMs in the contrib > directory. It > turns out the protections on those RPMs was set to 600 because I had > copied them out of root's home directory, thus they couldn't > be read by > the server to send them down to the compute nodes. After fixing the > permissions, all was well.

This is a 'me-too' reply.

Rocks reads the RPMs using http - hence they need to be readable by user apache.With symlinks - it is all too easy even if the RPMs are 644 for the directory tree to be somewhere not walakable by a 3rd party userid like apache.

Yours,

Page 244: 2003 December

Daniel.

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505----------------------- www.quadrics.com --------------------

>

From fds at sdsc.edu Mon Dec 22 15:26:01 2003From: fds at sdsc.edu (Federico Sacerdoti)Date: Mon, 22 Dec 2003 15:26:01 -0800Subject: [Rocks-Discuss]RE:Writing a Roll ?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

On Dec 22, 2003, at 11:12 AM, daniel.kidger at quadrics.com wrote:

> Federico,>>> Here is a little primer since it sounds like you are indeed ready.>> --- many very informative lines deleted --->>> Just a couple of questions for now:> 1. Do rolls have to be CD based ?> (during development I would probably get through a lot of CDROMs - > but more importantly it would get a bit fiddly> - to be keep walking round to the CD-writer - then nipping of to the > room with the cluster in every time)>For distribution, the rolls should probably be cd based. For development, however, that is not necessary. There is a make target which will compile your source, and "install" the roll into your local distribution. This is "make intodist" and assumes you are building on a frontend node. You would follow this call with a call to "rocks-dist dist" in the "/home/install" directory.

Of course, this makes most sense for rolls that affect compute nodes. To test parts of your roll that affect frontend functionality, you still need to use the CDs.

> 2. Do I have to reinstall the headnode from scratch each time I want > to test a roll ?> (even if the roll only affects RPMs that get installed on compute > nodes)

See comment above. We're working on a way to fully install frontends over the network, but it will not make it into the new release.

>> 3. Can a CD contain multiple rolls?> (Once mature - a cluster may have quite a few rolls: pbs, sge, gm, > IB, etc.> and Quadrics would proably have two - the (open-source) hardware > drivers,MPI,etc and also RMS - our (closed-source) cluster Resource

Page 245: 2003 December

> Manager.)

There is some support for this, we call them "Metarolls". We know they are important, and we have some support for them now. The build process for them is a bit different, and wont arrive for this release but soon after.

> 4. What subset of the cvs tree does a Roll developer need? The whole > tree is clearly rather excessive.>There are definately areas of the tree not necessary for roll building. Its always safest to have everything, but you're welcome to crop and test.

> 5. I am a little concerned about the amount of bloat needed to > install our five RPMs as a Roll.(The RPMs are already prebuilt by our > own internal build proceedures).> So taking another case - lets say the Intel Compilers - These have 4 > RPMs (plus a little sed-ery of their config files and pasting in the > license file). Would these be best installed as a Roll or as a simple > extend-compute.xml as I have currently?

It is better to put them in a roll. We are have ways to combine, distribute, sort, etc. these rolls, and they form a nice capsule of software to introduce into the system. I understand that pulling the whole source tree seems a bit excessive, but it is rather standard practice for working on an open project.

Plus only the developer needs the source, the consumer does not.

Good luck, and we're glad someone is asking the questions. Rolls are intended for outside construction, and we need to document the process. :)

-Federico

>> Yours,> Daniel.>> --------------------------------------------------------------> Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com> One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505> ----------------------- www.quadrics.com -------------------->>Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

From tlinden at pcu.helsinki.fi Tue Dec 23 05:28:35 2003From: tlinden at pcu.helsinki.fi (=?ISO-8859-15?Q?Tomas_Lind=E9n?=)Date: Tue, 23 Dec 2003 15:28:35 +0200 (EET)Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart?Message-ID: <[email protected]>

To reinstall a cluster I use the command

Page 246: 2003 December

cluster-fork /boot/kickstart/cluster-kickstartNow since all 32 nodes have been PXE installed this means that thereinstallation is performed by first doing a PXE-boot to load theinstallation kernel. My problem is that sometimes a few nodes failduring this reinstallation process. The failing nodes seem to be differentwhenever this problem occurs. The really strange thing is that aftermore than a day or so some nodes somehow manage to finish thereinstallation process!

Sometimes the whole cluster comes up fine without any lost node.

The problematic nodes _seem_ to get the installation kernel with PXE, soit might be not a PXE problem but something odd that happens later?

Has anyone seen anything like this before?

I'm aware of a bug in the RedHat installation kernelon Athlon systems when trying to run with a serial console. https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001988.htmlThis is why I run the installation kernel without a serial console, butthis makes debugging difficult because the serial console only showsoutput during the PXE boot process. No output is generated by theinstallation kernel itself. The next output is generated whenthe node has finished the installation and loads the final kernel whichruns fine with a serial console.

This is using Rocks 2.3.2 on a 32 node cluster with Tyan Tiger MPXS2466N-4M motherboards and dual Athlon MP CPUs with no graphicsadapters, so the system has a 32 port serial console switch. Themotherboards have integrated 100 Mb/s 3Com 3C920 NICs (in practice a3C905 NIC). The switch is made by Enterasys. The frontend private NIC isalso running at 100 Mb/s. When doing the cluster reinstallation thenetwork bandwidth over the frontend NIC saturates at 12,5 MB/s. Maybesome packets are lost because of this?

The frontend private ethernet connection will be upgraded to Gb/s.Hopefully this will solve this reinstallation problem.

Do you have any other ideas how to solve this problem?

Best regards, Tomas Lind?n--------------------------------------------------------------------------I , II Tomas Linden Helsinki Institute of Physics (HIP) II Tomas.Linden at Helsinki.FI P.O. Box 64 (Gustaf H?llstr?min katu 2) II phone: +358-9-191 505 63 FIN-00014 UNIVERSITY OF HELSINKI II fax: +358-9-191 505 53 Finland II WWW: http://www.physics.helsinki.fi/~tlinden/eindex.html I--------------------------------------------------------------------------

From kjcruz at ece.uprm.edu Tue Dec 23 05:31:26 2003From: kjcruz at ece.uprm.edu (Kennie Cruz)Date: Tue, 23 Dec 2003 09:31:26 -0400 (AST)Subject: [Rocks-Discuss]Error installing the compute nodeMessage-ID: <[email protected]>

Hi,

Page 247: 2003 December

I am trying to kickstart the compute nodes with Rocks 3.0.0, the frontend is already working. I revised the FAQ question 7.1.2, the services (dhcpd, httpd, mysqld and autofs) are running, but running kickstar.cgi from the command line give an error:

error - cannot kickstart external nodes

I made a quick search on the list, but without any success.

The compute node gets the assigned IP and insert-ethers detect the appliance without any trouble, but fails to run the kickstart.cgi from the frontend. The web server error log says something like this:

[Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed header from script. Bad header=# @Copyright@: /var/www/html/install/kickstart.cgi

While the access log says this:

10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocks HTTP/1.0" 500 587 "-" "-"

I ran insert-ethers with the Ethernet Switches option. My nodes are connected via 3 managed ethernet switches.

Any help will be appreciated.

Thanks in advance.

-- Kennie J. Cruz Gutierrez, System AdministratorDepartment of Electrical and Computer EngineeringUniversity of Puerto Rico, Mayaguez CampusWork Phone: (787) 832-4040 x 3798Email: Kennie.Cruz at ece.uprm.eduWeb: http://ece.uprm.edu/~kennie/

[2003-12-23/09:21]Black holes are created when God divides by zero!

From bruno at rocksclusters.org Tue Dec 23 08:33:39 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 23 Dec 2003 08:33:39 -0800Subject: [Rocks-Discuss]Error installing the compute nodeIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

just to be clear, did you execute:

# cd /home/install# ./kickstart.cgi --client compute-0-0

- gb

Page 248: 2003 December

On Dec 23, 2003, at 5:31 AM, Kennie Cruz wrote:

> Hi,>> I am trying to kickstart the compute nodes with Rocks 3.0.0, the > frontend> is already working. I revised the FAQ question 7.1.2, the services > (dhcpd,> httpd, mysqld and autofs) are running, but running kickstar.cgi from > the> command line give an error:>> error - cannot kickstart external nodes>> I made a quick search on the list, but without any success.>> The compute node gets the assigned IP and insert-ethers detect the> appliance without any trouble, but fails to run the kickstart.cgi from > the> frontend. The web server error log says something like this:>> [Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed > header> from script. Bad header=# @Copyright@: > /var/www/html/install/kickstart.cgi>> While the access log says this:>> 10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET> /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocks HTTP/1.0"> 500 587 "-" "-">> I ran insert-ethers with the Ethernet Switches option. My nodes are> connected via 3 managed ethernet switches.>> Any help will be appreciated.>> Thanks in advance.>> -- > Kennie J. Cruz Gutierrez, System Administrator> Department of Electrical and Computer Engineering> University of Puerto Rico, Mayaguez Campus> Work Phone: (787) 832-4040 x 3798> Email: Kennie.Cruz at ece.uprm.edu> Web: http://ece.uprm.edu/~kennie/>> [2003-12-23/09:21]> Black holes are created when God divides by zero!

From daniel.kidger at quadrics.com Tue Dec 23 09:03:49 2003From: daniel.kidger at quadrics.com (Daniel Kidger)Date: Tue, 23 Dec 2003 17:03:49 +0000Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart?In-Reply-To: <[email protected]>References: <[email protected]>

Page 249: 2003 December

Message-ID: <[email protected]>

Tomas Lind?n wrote:

>To reinstall a cluster I use the command> cluster-fork /boot/kickstart/cluster-kickstart>Now since all 32 nodes have been PXE installed this means that the>reinstallation is performed by first doing a PXE-boot to load the>installation kernel. My problem is that sometimes a few nodes fail>during this reinstallation process.>Although I haven't PXE installed a Rocks cluster of this size I have done PXE-based installs of (larger) RedHat clusters using a customised kickstart file. What can go wrong is that I have seen timeouts if too made nodes dhcp/tftp for their installer kernel simultaneously. You could try and increase the timeout or better not do too many at once - say start 8 at a time every 30 seconds. There is plenty of precedence of this in say the automated installer of the Alphaserver SC Tru64clusters. Also outside of Rocks I have seen folk use mutiple 'sub-master' nodes to act as tftp/http fileservers during the install process. It would be interesting to see what the Rocks developers vision is for the scalable installation of large clusters.

-- Yours,Daniel.

--------------------------------------------------------------Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.comOne Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505----------------------- www.quadrics.com --------------------

From mjk at sdsc.edu Tue Dec 23 09:44:14 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 23 Dec 2003 09:44:14 -0800Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

The problems is PXE has an extremely short timeout, and once it fails it does not retry. Since this is a BIOS thing, there isn't a lot to do. If you boot your compute nodes off of CDs (and avoid PXE), the problem goes away. This is because even if the DHCP timeouts we've modified our installation to be extremely aggressive in DHCP request and the entire installation process will actually watchdog timeout and restart if needed. Unfortunately, the PXE timeout cannot be fixed in the same way.

Our experience shows PXE to scale to 128 nodes for a mass re-install using current hardware. Older CPUs may shows issues. The only answer right now for this is to stage your re-install so the PXE server can handle the load. This load is actually very low, but the PXE server for Linux is still maturing.

-mjk

Page 250: 2003 December

On Dec 23, 2003, at 5:28 AM, Tomas Lind?n wrote:

> To reinstall a cluster I use the command> cluster-fork /boot/kickstart/cluster-kickstart> Now since all 32 nodes have been PXE installed this means that the> reinstallation is performed by first doing a PXE-boot to load the> installation kernel. My problem is that sometimes a few nodes fail> during this reinstallation process. The failing nodes seem to be > different> whenever this problem occurs. The really strange thing is that after> more than a day or so some nodes somehow manage to finish the> reinstallation process!>> Sometimes the whole cluster comes up fine without any lost node.>> The problematic nodes _seem_ to get the installation kernel with PXE, > so> it might be not a PXE problem but something odd that happens later?>> Has anyone seen anything like this before?>> I'm aware of a bug in the RedHat installation kernel> on Athlon systems when trying to run with a serial console.> > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/ > 001988.html> This is why I run the installation kernel without a serial console, but> this makes debugging difficult because the serial console only shows> output during the PXE boot process. No output is generated by the> installation kernel itself. The next output is generated when> the node has finished the installation and loads the final kernel which> runs fine with a serial console.>> This is using Rocks 2.3.2 on a 32 node cluster with Tyan Tiger MPX> S2466N-4M motherboards and dual Athlon MP CPUs with no graphics> adapters, so the system has a 32 port serial console switch. The> motherboards have integrated 100 Mb/s 3Com 3C920 NICs (in practice a> 3C905 NIC). The switch is made by Enterasys. The frontend private NIC > is> also running at 100 Mb/s. When doing the cluster reinstallation the> network bandwidth over the frontend NIC saturates at 12,5 MB/s. Maybe> some packets are lost because of this?>> The frontend private ethernet connection will be upgraded to Gb/s.> Hopefully this will solve this reinstallation problem.>> Do you have any other ideas how to solve this problem?>> Best regards, Tomas Lind?n> ----------------------------------------------------------------------- > ---> I , > I> I Tomas Linden Helsinki Institute of Physics (HIP) > I> I Tomas.Linden at Helsinki.FI P.O. Box 64 (Gustaf H?llstr?min katu > 2) I> I phone: +358-9-191 505 63 FIN-00014 UNIVERSITY OF HELSINKI

Page 251: 2003 December

> I> I fax: +358-9-191 505 53 Finland > I> I WWW: http://www.physics.helsinki.fi/~tlinden/eindex.html > I> ----------------------------------------------------------------------- > ---

From Timothy.Carlson at pnl.gov Tue Dec 23 08:57:07 2003From: Timothy.Carlson at pnl.gov (Carlson, Timothy S)Date: Tue, 23 Dec 2003 08:57:07 -0800Subject: [Rocks-Discuss]Error installing the compute nodeMessage-ID: <[email protected]>

The problem he is having is that he chose "ethernet switches" whenrunning insert-ethers. He should have chosen "Compute nodes".

Only choose "ethernet switches" when you are assigning an IP address toan ethernet switch with DHCP. If your managed switches already have IPaddresses, then just install "compute nodes"

Tim

-----Original Message-----From: Greg Bruno [mailto:bruno at rocksclusters.org] Sent: Tuesday, December 23, 2003 8:34 AMTo: Kennie CruzCc: npaci-rocks-discussion at sdsc.eduSubject: Re: [Rocks-Discuss]Error installing the compute node

just to be clear, did you execute:

# cd /home/install# ./kickstart.cgi --client compute-0-0

- gb

On Dec 23, 2003, at 5:31 AM, Kennie Cruz wrote:

> Hi,>> I am trying to kickstart the compute nodes with Rocks 3.0.0, the> frontend> is already working. I revised the FAQ question 7.1.2, the services > (dhcpd,> httpd, mysqld and autofs) are running, but running kickstar.cgi from > the> command line give an error:>> error - cannot kickstart external nodes>> I made a quick search on the list, but without any success.

Page 252: 2003 December

>> The compute node gets the assigned IP and insert-ethers detect the > appliance without any trouble, but fails to run the kickstart.cgi from

> the frontend. The web server error log says something like this:>> [Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed> header> from script. Bad header=# @Copyright@: > /var/www/html/install/kickstart.cgi>> While the access log says this:>> 10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET> /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocksHTTP/1.0"> 500 587 "-" "-">> I ran insert-ethers with the Ethernet Switches option. My nodes are > connected via 3 managed ethernet switches.>> Any help will be appreciated.>> Thanks in advance.>> --> Kennie J. Cruz Gutierrez, System Administrator> Department of Electrical and Computer Engineering> University of Puerto Rico, Mayaguez Campus> Work Phone: (787) 832-4040 x 3798> Email: Kennie.Cruz at ece.uprm.edu> Web: http://ece.uprm.edu/~kennie/>> [2003-12-23/09:21]> Black holes are created when God divides by zero!

From purikk at hotmail.com Tue Dec 23 12:48:30 2003From: purikk at hotmail.com (Purushotham Komaravolu)Date: Tue, 23 Dec 2003 15:48:30 -0500Subject: [Rocks-Discuss]beowulf and rocksMessage-ID: <[email protected]>

Hi, I keep people mentioning about beowulf and Rocks, can somebody point methe differnece between them. They they just two different solutions forClusters?ThanksRegards,Puru

From tim.carlson at pnl.gov Tue Dec 23 13:19:39 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Tue, 23 Dec 2003 13:19:39 -0800 (PST)Subject: [Rocks-Discuss]beowulf and rocksIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

Page 253: 2003 December

On Tue, 23 Dec 2003, Purushotham Komaravolu wrote:

> I keep people mentioning about beowulf and Rocks, can somebody point me> the differnece between them. They they just two different solutions for> Clusters?

Beowulf is a loose definition for a cluster of machines (typically off theshelf hardware). Beowulf is not software.

Rocks is a software solution to manage your beowulf.

You can compare rocks/oscar/scyld/ as software systems for your beowulfcluster.

Read Robert Brown's book on beowulfs at this URL

http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book/beowulf_book/index.html

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From dlane at ap.stmarys.ca Tue Dec 23 14:53:51 2003From: dlane at ap.stmarys.ca (Dave Lane)Date: Tue, 23 Dec 2003 18:53:51 -0400Subject: [Rocks-Discuss]beowulf and rocksIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

At 03:48 PM 12/23/2003 -0500, Purushotham Komaravolu wrote:>Hi,> I keep people mentioning about beowulf and Rocks, can somebody point me>the differnece between them. They they just two different solutions for>Clusters?

Beowulf is a loosely-defined generic term (that I won't attempt do define now!), while Rocks is one of the several software distributions that implement a beowulf cluster.

... Dave

From junkscarce at hotmail.com Tue Dec 23 15:43:05 2003From: junkscarce at hotmail.com (Reed Scarce)Date: Tue, 23 Dec 2003 23:43:05 +0000Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsMessage-ID: <[email protected]>

Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml lies code like this commented code:<post>/bin/mkdir /mnt/plc/ <-- works -->

Page 254: 2003 December

/bin/mkdir /mnt/plc/plc_data <-- works -->/bin/ln -s /mnt/plc_data /data1 <-- works -->/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, source exists --></post>

I don't understand why the ln to a directory succeeds but a ln to a script fails.

BTW, Dr. Landman, I've attempted to use your build.pl but it seems to faill with:Can't stat `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .(my note: the path ends at RPMS) I swear I thought I saw a solution to this once but I can't find it again.Upon reinstallation with the file your tool created (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda threw back the exception: Traceback (innermost last): file "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, configFileData) File "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427 in runok save debug

TIA Reed Scarce

_________________________________________________________________Tired of slow downloads? Compare online deals from your local high-speed providers now. https://broadband.msn.com

From landman at scalableinformatics.com Tue Dec 23 16:17:58 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 23 Dec 2003 19:17:58 -0500Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Hi Reed:

Which version of finishing server fails on which version of ROCKS? Itlooks like 3.0. I am up to 3.1.0 now. With a little bit of modificationI could make it work with 2.3.2. Likely just a single line to point tothe right path.

Let me know and I'll see what I can do. I would recommend using the3.1.0 environment, as it is a significant (read as massive) improvementover previous versions. If you (and others) need it to work with older(pre-3.0) versions of ROCKS, I think I can handle that. Let me know.

Joe

On Tue, 2003-12-23 at 18:43, Reed Scarce wrote:> Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml > lies code like this commented code:> <post>

Page 255: 2003 December

> /bin/mkdir /mnt/plc/ <-- works -->> /bin/mkdir /mnt/plc/plc_data <-- works -->> /bin/ln -s /mnt/plc_data /data1 <-- works -->> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, source > exists -->> </post>> > I don't understand why the ln to a directory succeeds but a ln to a script > fails.> > BTW, Dr. Landman, I've attempted to use your build.pl but it seems to faill > with:> Can't stat `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm

From mjk at sdsc.edu Tue Dec 23 16:35:13 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 23 Dec 2003 16:35:13 -0800Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

"man chkconfig"

If you use chkconfig you do not need to create the rc*.d/* files and they are put in place for you.

-mjk

On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:

> Within /export/home/install/profiles/2.3.2/site-nodes > extend-compute.xml lies code like this commented code:> <post>> /bin/mkdir /mnt/plc/ <-- works -->> /bin/mkdir /mnt/plc/plc_data <-- works -->> /bin/ln -s /mnt/plc_data /data1 <-- works -->> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, > source exists -->> </post>>> I don't understand why the ln to a directory succeeds but a ln to a > script fails.>> BTW, Dr. Landman, I've attempted to use your build.pl but it seems to > faill with:> Can't stat > `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .> (my note: the path ends at RPMS) I swear I thought I saw a solution > to this once but I can't find it again.> Upon reinstallation with the file your tool created > (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda > threw back the exception: Traceback (innermost last): file > "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, > configFileData) File > "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line > 427 in run

Page 256: 2003 December

> ok save debug>>> TIA Reed Scarce>> _________________________________________________________________> Tired of slow downloads? Compare online deals from your local > high-speed providers now. https://broadband.msn.com

From jkreuzig at uci.edu Tue Dec 23 19:53:16 2003From: jkreuzig at uci.edu (James Kreuziger)Date: Tue, 23 Dec 2003 19:53:16 -0800 (PST)Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Thanks everybody for the info. I was aware of the fast-link issue;However, after enabling it, we still were unable to see the switchfrom the frontend. We had a laptop hooked up to the switch via serialand ethernet and was able to turn on the fast-link, and assign anIP address. After that, the web-based interface came up on the laptop.Still, no response on the switch from the frontend.

So after great gnashing of teeth, and dozens of re-installs of thefrontend, success! The problem? The extra nic card on the frontend.We had bought the frontend with a dual 1GB card and a single 100MB card.Whenever the single nic card is installed, the system always takes thisas eth0. This is something that was staring us right in the face, sothat's why it probably took so long to figure out.

After 3 years of trying to find the money, we finally have our first8 node cluster up!

-Jim

*************************************************Jim Kreuzigerjkreuzig at uci.edu949-824-4474*************************************************

From landman at scalableinformatics.com Tue Dec 23 20:23:35 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 23 Dec 2003 23:23:35 -0500Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Hi James:

One of the things I do first time I boot up a new head node is to map

Page 257: 2003 December

the ethernet ports. I take out all but one of the network wires, and make sure there is real network traffic. A ping on the subnet is fine. Then I tcpdump the network port. What is suprising to me is how many times the assumed network eth0 is mapped differently. Then by hand, after mapping the rest of the ports, I manually modify the /etc/modules.conf file to reflect what I need.

Just a suggestion. Having been bitten enough, I find simple sanity checks help reduce the size or dimensionality of the space of possible problems. This makes these debugging sessions usually faster, and allows for better characterization of the issue.

Joe

James Kreuziger wrote:

>Thanks everybody for the info. I was aware of the fast-link issue;>However, after enabling it, we still were unable to see the switch>from the frontend. We had a laptop hooked up to the switch via serial>and ethernet and was able to turn on the fast-link, and assign an>IP address. After that, the web-based interface came up on the laptop.>Still, no response on the switch from the frontend.>>So after great gnashing of teeth, and dozens of re-installs of the>frontend, success! The problem? The extra nic card on the frontend.>We had bought the frontend with a dual 1GB card and a single 100MB card.>Whenever the single nic card is installed, the system always takes this>as eth0. This is something that was staring us right in the face, so>that's why it probably took so long to figure out.>>After 3 years of trying to find the money, we finally have our first>8 node cluster up!>>-Jim>>*************************************************>Jim Kreuziger>jkreuzig at uci.edu>949-824-4474>*************************************************>> >

-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

From bruno at rocksclusters.org Tue Dec 23 21:26:08 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 23 Dec 2003 21:26:08 -0800Subject: [Rocks-Discuss]Rocks 3.1.0 is released for x86, ia64 and x86-64Message-ID: <[email protected]>

Page 258: 2003 December

Version 3.1.0 (Matterhorn) of the Rocks cluster distribution is released and now supports three processor families: Intel IA-32, Intel Itanium Processor Family, and AMD Opteron. ?This is the released version of software that was used to build a fully-functioning 128-node grid-enabled cluster?in under 2 hours on opening night ?last month at?SC2003 in Phoenix, AZ. ?Rocks is developed by the Grid and Cluster Computing Group at SDSC and by partners at the University of California, Berkeley, Scalable Systems in Singapore, and individual open-source software developers.

This is a co-release for x86 (Pentium, Athlon, and others), Itanium2 (IA-64) and Opteron (x86-64) based clusters. Software is freely available for download to burn onto a bootable CD set for x86 and x86-64 or a single DVD for Itanium2. Versions for all processor families are available at http://www.rocksclusters.org/.

Introduced on Version 3.0.0, this version enhances the ?roll? mechanism to enable users, communities and others to easily add on optional software and configuration. ?These optional ?Roll CDs? extend the system by integrating seamlessly and automatically into the management and packaging mechanisms used by base software. For all intents and purposes, rolls appear as if they are part of the original CD distribution. ?A number of defined extension rolls are freely available and include HPC, Sun Grid Engine, Grid (based on NMI), Java and Intel Compiler. An important feature is that new rolls can be created or updated independently of the core distribution. This fundamentally enables science teams and communities to add on domain-specific software packages, define a particular grid configuration, or simply modify any of the default configuration or package settings.

New features in NPACI Rocks 3.1.0 include:

- Opteron Support- Sun Grid Engine as default queuing System- Upgraded Ganglia server and client, used for collecting and visualizing cluster-wide monitoring metrics- Upgraded MPICH-GM and Myrinet GM 2.0 for the latest Rev D. Cards- Rocks-developed 411 information system to replace Network Information Service (NIS)- Updated SSH version 3.7.1 with no login delay- Several Optional Software Rolls including:

- NSF Middleware Initiative version R4 grid distribution- Java 2- Intel Compilers for x86 and ia64

Rocks 3.1.0 is derived from Red Hat?s publicly available source packages (SRPMS) used in portions of their Enterprise Linux 3.0 Product Line. All SRPMs ?have been recompiled to enable redistribution. ?All available updates for these packages have been pre-applied. Rocks-specific software and standard cluster and grid community software is then added to create a complete clustering toolkit. ?All Rocks source code is available in a public CVS repository.

From angel at miami.edu Wed Dec 24 13:14:59 2003From: angel at miami.edu (Angel Li)Date: Wed, 24 Dec 2003 16:14:59 -0500

Page 259: 2003 December

Subject: [Rocks-Discuss]Rocks 3.1.0 is released for x86, ia64 and x86-64In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Hi,

I currently have a cluster running Rocks 3.0 and I'm considering upgrading to 3.1. Now that SGE is the default batch queue, is maui working? Also, the Intel compiler roll is included. What licensing issues will I encounter? We currently have a license for version 7.

Thanks,

Angel

From bruno at rocksclusters.org Wed Dec 24 14:14:46 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 24 Dec 2003 14:14:46 -0800Subject: [Rocks-Discuss]Rocks 3.1.0 is released for x86, ia64 and x86-64In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

> I currently have a cluster running Rocks 3.0 and I'm considering > upgrading to 3.1. Now that SGE is the default batch queue, is maui > working?

maui and pbs are currently not available in rocks 3.1, but it will be soon.

maui and pbs will be included in its own roll -- that effort will be driven by roy dragseth from the University of Troms?.

> Also, the Intel compiler roll is included. What licensing issues will > I encounter? We currently have a license for version 7.

i'm not sure how the licenses transfer between versions.

after you bring up a frontend with the intel roll, the following link is available on the frontend's home page:

http://www.intel.com/software/products/distributors/rock_cluster.htm

after you purchase a license, you just need to copy the license into the appropriate directory and then start compiling.

for fortran, the appropriate directory is:

/opt/intel_fc_80/licenses

and for C, the appropriate directory is:

/opt/intel_cc_80/licenses

also, the intel roll contains a pre-built MPICH environment -- it is

Page 260: 2003 December

found under /opt/mpich/intel.

- gb

From cdwan at mail.ahc.umn.edu Wed Dec 24 14:17:28 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Wed, 24 Dec 2003 16:17:28 -0600 (CST)Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Once upon a time, I decided to install a third interface in a rocks headnode (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for adata network. At boot time *everything* was broken.

To make a long story less long, the system had remapped itself with thenew gig card as eth0, and the other two shifted up by one. That wasreally close to "no fun at all."

Happy holidays! I'm burning the new release right now!

-C

From michal at harddata.com Wed Dec 24 15:05:43 2003From: michal at harddata.com (Michal Jaegermann)Date: Wed, 24 Dec 2003 16:05:43 -0700Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>; from [email protected] on Wed, Dec 24, 2003 at 04:17:28PM -0600References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Wed, Dec 24, 2003 at 04:17:28PM -0600, Chris Dwan (CCGB) wrote:> > Once upon a time, I decided to install a third interface in a rocks head> node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for a> data network. At boot time *everything* was broken.

I still cannot understand why people insists on NOT using 'nameif'utility. All network interfaces can be named whichever way you wantand they will not move regardless how many NICs you will add orremove as long as MACs are not changed. If you replace a card witha different one then /etc/mactab needs to be edited to reflect yournew configuration. On clients nodes with an automatic reinstallthis indeed is not practical but for your front end machine this isanother story.

It is indeed the case that default startup scripts from Red Hat 7.3need some simple additions as interface (re)naming need to be donebefore NICs are brought up for the first time. In RH9 and FC1

Page 261: 2003 December

'nameif' will be used "automagically" if HWADDR variable is defined(and with a correct value).

Of course if you have different drivers for different NICs, and theyare loaded as modules, then names can be assigned by editing/etc/modules.conf

Michal

From bruno at rocksclusters.org Wed Dec 24 15:41:25 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 24 Dec 2003 15:41:25 -0800Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

>> Once upon a time, I decided to install a third interface in a rocks >> head>> node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) >> for a>> data network. At boot time *everything* was broken.>> I still cannot understand why people insists on NOT using 'nameif'> utility. All network interfaces can be named whichever way you want> and they will not move regardless how many NICs you will add or> remove as long as MACs are not changed. If you replace a card with> a different one then /etc/mactab needs to be edited to reflect your> new configuration. On clients nodes with an automatic reinstall> this indeed is not practical but for your front end machine this is> another story.>> It is indeed the case that default startup scripts from Red Hat 7.3> need some simple additions as interface (re)naming need to be done> before NICs are brought up for the first time. In RH9 and FC1> 'nameif' will be used "automagically" if HWADDR variable is defined> (and with a correct value).

michal,

for this release, we looked at your suggestion of using nameif -- we did a quick prototype and it looks like it will be the right thing to do. we sketched out a design and found that the full solution will require many pieces (database changes, installer changes and the obvious XML file changes). we left this out of 3.1.0 but it is towards the top of our list for the next release.

thanks for the suggestion of nameif -- it is suggestions like that which help us to define the direction of rocks.

- gb

Page 262: 2003 December

From landman at scalableinformatics.com Wed Dec 24 16:08:54 2003From: landman at scalableinformatics.com (Joe Landman)Date: Wed, 24 Dec 2003 19:08:54 -0500Subject: [Rocks-Discuss]Dell Power Connect 5224In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

Michal Jaegermann wrote:

>On Wed, Dec 24, 2003 at 04:17:28PM -0600, Chris Dwan (CCGB) wrote:> >>>Once upon a time, I decided to install a third interface in a rocks head>>node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for a>>data network. At boot time *everything* was broken.>> >>>>I still cannot understand why people insists on NOT using 'nameif'>utility. All network interfaces can be named whichever way you want>and they will not move regardless how many NICs you will add or>remove as long as MACs are not changed. If you replace a card with>a different one then /etc/mactab needs to be edited to reflect your>new configuration. On clients nodes with an automatic reinstall>this indeed is not practical but for your front end machine this is>another story.> >Agreed, though as far as I can tell, nameif is not used in the /etc/init.d scripts. It is used by ifup, so you would have to set HWADDR on each interface in the /etc/sysconfig/.../ifcfg-eth* files (the ... refers to that RH9 and RHEL3 have moved where these things sit from what we were used to in RH7.x). Still need to map the interfaces though, to see which physical port corresponds to which device/mac address. With that in hand, you can set up the HWADDR or just swap cables. With the advent of the folks making exactly the right length cables (e.g. not giving any play, and placing them under tension while plugged in...) the cable swap doesnt work well for mapping on some systems. Moreover, on a fair number of systems I have played with, the BIOS is setup so that if they PXE boot, they are doing so from the address that the installed version of ROCKS would see as eth1. Annoying.

-- Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

Page 263: 2003 December

From junkscarce at hotmail.com Fri Dec 26 15:35:57 2003From: junkscarce at hotmail.com (Reed Scarce)Date: Fri, 26 Dec 2003 23:35:57 +0000Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsMessage-ID: <[email protected]>

The line:

chkconfig --level 3 gpm on

works great from the command line, not in extend-compute.xml. Thanks for the new tool though, always glad. The line above is in a block without <eval shell="bash"> tags. I'll keep trying and rtm. Is it possible this is a 2.6.2 issue? The live environment restricts me from using a more recent version.

>From: "Mason J. Katz" <mjk at sdsc.edu>>To: "Reed Scarce" <junkscarce at hotmail.com>>CC: npaci-rocks-discussion at sdsc.edu>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails>Date: Tue, 23 Dec 2003 16:35:13 -0800>>"man chkconfig">>If you use chkconfig you do not need to create the rc*.d/* files and they >are put in place for you.>> -mjk>>On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:>>>Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml >>lies code like this commented code:>><post>>>/bin/mkdir /mnt/plc/ <-- works -->>>/bin/mkdir /mnt/plc/plc_data <-- works -->>>/bin/ln -s /mnt/plc_data /data1 <-- works -->>>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, >>source exists -->>></post>>>>>I don't understand why the ln to a directory succeeds but a ln to a script >>fails.>>>>BTW, Dr. Landman, I've attempted to use your build.pl but it seems to >>faill with:>>Can't stat >>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .>>(my note: the path ends at RPMS) I swear I thought I saw a solution to >>this once but I can't find it again.>>Upon reinstallation with the file your tool created >>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda >>threw back the exception: Traceback (innermost last): file >>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, >>configFileData) File >>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427 in >>run>>ok save debug

Page 264: 2003 December

>>>>>>TIA Reed Scarce>>>>_________________________________________________________________>>Tired of slow downloads? Compare online deals from your local high-speed >>providers now. https://broadband.msn.com>

_________________________________________________________________Worried about inbox overload? Get MSN Extra Storage now! http://join.msn.com/?PAGE=features/es

From mjk at sdsc.edu Fri Dec 26 16:46:22 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Fri, 26 Dec 2003 16:46:22 -0800Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Not sure if this answers your question. But..

The <eval></eval> blocks are for code to be run on the kickstart server (the one the generates the kickstart file). Code outside of the eval blocks is run on the kickstarting host.

-mjk

On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote:

> The line:>> chkconfig --level 3 gpm on>> works great from the command line, not in extend-compute.xml. Thanks > for the new tool though, always glad. The line above is in a block > without <eval shell="bash"> tags. I'll keep trying and rtm. Is it > possible this is a 2.6.2 issue? The live environment restricts me > from using a more recent version.>>>> From: "Mason J. Katz" <mjk at sdsc.edu>>> To: "Reed Scarce" <junkscarce at hotmail.com>>> CC: npaci-rocks-discussion at sdsc.edu>> Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation >> fails>> Date: Tue, 23 Dec 2003 16:35:13 -0800>>>> "man chkconfig">>>> If you use chkconfig you do not need to create the rc*.d/* files and >> they are put in place for you.>>>> -mjk>>

Page 265: 2003 December

>> On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:>>>>> Within /export/home/install/profiles/2.3.2/site-nodes >>> extend-compute.xml lies code like this commented code:>>> <post>>>> /bin/mkdir /mnt/plc/ <-- works -->>>> /bin/mkdir /mnt/plc/plc_data <-- works -->>>> /bin/ln -s /mnt/plc_data /data1 <-- works -->>>> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, >>> source exists -->>>> </post>>>>>>> I don't understand why the ln to a directory succeeds but a ln to a >>> script fails.>>>>>> BTW, Dr. Landman, I've attempted to use your build.pl but it seems >>> to faill with:>>> Can't stat >>> `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .>>> (my note: the path ends at RPMS) I swear I thought I saw a solution >>> to this once but I can't find it again.>>> Upon reinstallation with the file your tool created >>> (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) >>> anaconda threw back the exception: Traceback (innermost last): file >>> "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, >>> configFileData) File >>> "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line >>> 427 in run>>> ok save debug>>>>>>>>> TIA Reed Scarce>>>>>> _________________________________________________________________>>> Tired of slow downloads? Compare online deals from your local >>> high-speed providers now. https://broadband.msn.com>>>> _________________________________________________________________> Worried about inbox overload? Get MSN Extra Storage now! > http://join.msn.com/?PAGE=features/es

From apseyed at bu.edu Sat Dec 27 12:32:40 2003From: apseyed at bu.edu (apseyed at bu.edu)Date: Sat, 27 Dec 2003 15:32:40 -0500Subject: [Rocks-Discuss]Re: npaci-rocks-discussion digest, Vol 1 #663 - 2 msgsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

For what its worth,

Why don't you try specifying the absolute path (/sbin/chkconfig) and setting debug flags and output file. (If you can confirm /sbin is in $PATH during the life of the script nevermind the first suggestion.)

echo "got to chkconfig beginning" > /tmp/ks.log

Page 266: 2003 December

/sbin/chkconfig --level 3 gpm onecho "go to chkconfig end" >> /tmp/ks.log/sbin/chkconfig --list | grep gpm >> /tmp/ks.log

-Patrice

Quoting npaci-rocks-discussion-request at sdsc.edu:

> Send npaci-rocks-discussion mailing list submissions to> npaci-rocks-discussion at sdsc.edu> > To subscribe or unsubscribe via the World Wide Web, visit> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion> or, via email, send a message with subject or body 'help' to> npaci-rocks-discussion-request at sdsc.edu> > You can reach the person managing the list at> npaci-rocks-discussion-admin at sdsc.edu> > When replying, please edit your Subject line so it is more specific> than "Re: Contents of npaci-rocks-discussion digest..."> > > Today's Topics:> > 1. Re: Extend-compute.xml issue, ln creation fails (Reed Scarce)> 2. Re: Extend-compute.xml issue, ln creation fails (Mason J.> Katz)> > --__--__--> > Message: 1> From: "Reed Scarce" <junkscarce at hotmail.com>> To: mjk at sdsc.edu> Cc: npaci-rocks-discussion at sdsc.edu> Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation> fails> Date: Fri, 26 Dec 2003 23:35:57 +0000> > The line:> > chkconfig --level 3 gpm on> > works great from the command line, not in extend-compute.xml. Thanks> for > the new tool though, always glad. The line above is in a block> without > <eval shell="bash"> tags. I'll keep trying and rtm. Is it possible> this is > a 2.6.2 issue? The live environment restricts me from using a more> recent > version.> > > >From: "Mason J. Katz" <mjk at sdsc.edu>> >To: "Reed Scarce" <junkscarce at hotmail.com>> >CC: npaci-rocks-discussion at sdsc.edu> >Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation

Page 267: 2003 December

> fails> >Date: Tue, 23 Dec 2003 16:35:13 -0800> >> >"man chkconfig"> >> >If you use chkconfig you do not need to create the rc*.d/* files and> they > >are put in place for you.> >> > -mjk> >> >On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:> >> >>Within /export/home/install/profiles/2.3.2/site-nodes> extend-compute.xml > >>lies code like this commented code:> >><post>> >>/bin/mkdir /mnt/plc/ <-- works -->> >>/bin/mkdir /mnt/plc/plc_data <-- works -->> >>/bin/ln -s /mnt/plc_data /data1 <-- works -->> >>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to> ln, > >>source exists -->> >></post>> >>> >>I don't understand why the ln to a directory succeeds but a ln to a> script > >>fails.> >>> >>BTW, Dr. Landman, I've attempted to use your build.pl but it seems> to > >>faill with:> >>Can't stat > >>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm> .> >>(my note: the path ends at RPMS) I swear I thought I saw a> solution to > >>this once but I can't find it again.> >>Upon reinstallation with the file your tool created > >>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm)> anaconda > >>threw back the exception: Traceback (innermost last): file > >>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, > >>configFileData) File > >>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line> 427 in > >>run> >>ok save debug> >>> >>> >>TIA Reed Scarce> >>> >>_________________________________________________________________> >>Tired of slow downloads? Compare online deals from your local> high-speed > >>providers now. https://broadband.msn.com> >> > _________________________________________________________________

Page 268: 2003 December

> Worried about inbox overload? Get MSN Extra Storage now! > http://join.msn.com/?PAGE=features/es> > > --__--__--> > Message: 2> Cc: npaci-rocks-discussion at sdsc.edu> From: "Mason J. Katz" <mjk at sdsc.edu>> Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation> fails> Date: Fri, 26 Dec 2003 16:46:22 -0800> To: "Reed Scarce" <junkscarce at hotmail.com>> > Not sure if this answers your question. But..> > The <eval></eval> blocks are for code to be run on the kickstart> server > (the one the generates the kickstart file). Code outside of the eval> > blocks is run on the kickstarting host.> > -mjk> > > On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote:> > > The line:> >> > chkconfig --level 3 gpm on> >> > works great from the command line, not in extend-compute.xml. > Thanks > > for the new tool though, always glad. The line above is in a block> > > without <eval shell="bash"> tags. I'll keep trying and rtm. Is it> > > possible this is a 2.6.2 issue? The live environment restricts me> > > from using a more recent version.> >> >> >> From: "Mason J. Katz" <mjk at sdsc.edu>> >> To: "Reed Scarce" <junkscarce at hotmail.com>> >> CC: npaci-rocks-discussion at sdsc.edu> >> Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation> > >> fails> >> Date: Tue, 23 Dec 2003 16:35:13 -0800> >>> >> "man chkconfig"> >>> >> If you use chkconfig you do not need to create the rc*.d/* files> and > >> they are put in place for you.> >>> >> -mjk> >>> >> On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:

Page 269: 2003 December

> >>> >>> Within /export/home/install/profiles/2.3.2/site-nodes > >>> extend-compute.xml lies code like this commented code:> >>> <post>> >>> /bin/mkdir /mnt/plc/ <-- works -->> >>> /bin/mkdir /mnt/plc/plc_data <-- works -->> >>> /bin/ln -s /mnt/plc_data /data1 <-- works -->> >>> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to> ln, > >>> source exists -->> >>> </post>> >>>> >>> I don't understand why the ln to a directory succeeds but a ln to> a > >>> script fails.> >>>> >>> BTW, Dr. Landman, I've attempted to use your build.pl but it> seems > >>> to faill with:> >>> Can't stat > >>> `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm> .> >>> (my note: the path ends at RPMS) I swear I thought I saw a> solution > >>> to this once but I can't find it again.> >>> Upon reinstallation with the file your tool created > >>> (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) > >>> anaconda threw back the exception: Traceback (innermost last):> file > >>> "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch,> > >>> configFileData) File > >>> "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py",> line > >>> 427 in run> >>> ok save debug> >>>> >>>> >>> TIA Reed Scarce> >>>> >>>> _________________________________________________________________> >>> Tired of slow downloads? Compare online deals from your local > >>> high-speed providers now. https://broadband.msn.com> >>> >> > _________________________________________________________________> > Worried about inbox overload? Get MSN Extra Storage now! > > http://join.msn.com/?PAGE=features/es> > > > --__--__--> > _______________________________________________> npaci-rocks-discussion mailing list> npaci-rocks-discussion at sdsc.edu> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion>

Page 270: 2003 December

> > End of npaci-rocks-discussion Digest>

From rocks_india at yahoo.co.in Sat Dec 27 20:20:40 2003From: rocks_india at yahoo.co.in (=?iso-8859-1?q?Rocks=20India?=)Date: Sun, 28 Dec 2003 04:20:40 +0000 (GMT)Subject: [Rocks-Discuss]Rocks 3.0 NewbeeeeeeeeMessage-ID: <[email protected]>

Hello All, I am new to Rocks, i was able to downloadandinstall Rocks 3.0. I am not sure if Globus 3.0 getsinstalled during the installation process.I tried touse simple ca commands and get command not founderror. Do i need to download Globus Tool Kit andinstall it or would it be installed along with rocks.

Or can any one direct me to a site or give me stepsthatneed to be taken after installing rocks what need tobe done for manipulating globus

Rocks-India

________________________________________________________________________Yahoo! India Matrimony: Find your partner online.Go to http://yahoo.shaadi.com

From bruno at rocksclusters.org Sat Dec 27 21:35:28 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Sat, 27 Dec 2003 21:35:28 -0800Subject: [Rocks-Discuss]Rocks 3.0 NewbeeeeeeeeIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> I am new to Rocks, i was able to download> and> install Rocks 3.0. I am not sure if Globus 3.0 gets> installed during the installation process.I tried to> use simple ca commands and get command not found> error.> Do i need to download Globus Tool Kit and> install it or would it be installed along with rocks.>> Or can any one direct me to a site or give me steps> that> need to be taken after installing rocks what need to> be done for manipulating globus

here's the steps, but it would require reinstalling your frontend:

go to:

Page 271: 2003 December

http://www.rocksclusters.org/rocks-documentation/3.1.0/iso-images.html

and download:

Rocks Base, HPC Roll, SGE Roll and the Grid Roll

then burn them all to CD.

the follow the directions at:

http://www.rocksclusters.org/rocks-documentation/3.1.0/install- frontend.html

but, before you get started, you should consult this page too:

http://rocks.npaci.edu/roll-documentation/grid/3.0/adding-the-roll.html

at the end of the process, your frontend will be configured with globus.

- gb

From ramonjt at ucia.gov Mon Dec 29 09:08:45 2003From: ramonjt at ucia.gov (ramonjt)Date: Mon, 29 Dec 2003 12:08:45 -0500Subject: [Rocks-Discuss]Rocks 3.1.0Message-ID: <[email protected]>

Folks,

Which set of Rocks 3.1.0 downloads support Xeon Processors, "Pentiumand Athlon" or "Itanium"?

Thanks,Ramon

From bruno at rocksclusters.org Mon Dec 29 09:31:56 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 29 Dec 2003 09:31:56 -0800Subject: [Rocks-Discuss]Rocks 3.1.0In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> Which set of Rocks 3.1.0 downloads support Xeon Processors, "Pentium> and Athlon" or "Itanium"?

xeons are x86 processors -- so you want the ISO images found under the section:

Software for x86 (Pentium and Athlon)

- gb

Page 272: 2003 December

From landman at scalableinformatics.com Mon Dec 29 10:49:49 2003From: landman at scalableinformatics.com (landman)Date: Mon, 29 Dec 2003 13:49:49 -0500Subject: [Rocks-Discuss]3.1.0 surprisesMessage-ID: <[email protected]>

Pulled the distro. Burned it after checking md5's. Ok. Booted/installed testcluster, completely vanilla, just defaults.

SSH is too slow. Wow. 5-10 seconds to log in.

Ok, now out at a customer site with the disks.

Unhappily discovered that the following are missing:

a) md (e.g. Software RAID): Just try to build one. Anaconda will happily letyou do this ... though it will die in the formatting stages. Dropping into theshell (Alt-F2) and looking for the md module (lsmod) shows nothing. Insmod themd also doesn't do anything. Catting /proc/devices shows no md as a characteror block device.

If md is really not there anymore, it should be removed from anaconda, just like ...

b) ext3. There is no ext3 available for the install.

Also discovered how incredibly fragile anaconda is. In order to install, youhave to wipe the disks. It will not install if there is an md (software raid)device, chosing instead to crap out after you have entered in all theinformation. To say that this is annoying is a slight understatement. This isan anaconda issue, not a ROCKS issue, though as a result of this issue, ROCKS is less functional than it could be.

I also noted that there is no xfs option. This means that I will need to hacknew kernels later on after the install. Moreover, I will also need to turn onthe ext3 journaling features later on (post install).

Hopefully 3.1.1 or 3.2 will fix some of these things.

Joe

--Joseph Landman, Ph.DScalable Informatics LLC,email: landman at scalableinformatics.comweb : http://scalableinformatics.comphone: +1 734 612 4615

From junkscarce at hotmail.com Mon Dec 29 15:15:52 2003From: junkscarce at hotmail.com (Reed Scarce)Date: Mon, 29 Dec 2003 23:15:52 +0000Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsMessage-ID: <[email protected]>

Page 273: 2003 December

Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that work? I need to know the limitations of the distribution. As far as I can tell the commands are available (`which command`locates the commands fine) but they don't necessarily perform the job as expected. I had seen the `eval...` clairification in the archives.

As it stands I plan to mkdir, ln and echo in the extend-c... but then run the heart of the customization (scripted) once the nodes are up. It just doesn't seem to be what was intended.

As always, thanks for your help--Reed

>From: "Mason J. Katz" <mjk at sdsc.edu>>To: "Reed Scarce" <junkscarce at hotmail.com>>CC: npaci-rocks-discussion at sdsc.edu>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails>Date: Fri, 26 Dec 2003 16:46:22 -0800>>Not sure if this answers your question. But..>>The <eval></eval> blocks are for code to be run on the kickstart server >(the one the generates the kickstart file). Code outside of the eval >blocks is run on the kickstarting host.>> -mjk>>>On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote:>>>The line:>>>>chkconfig --level 3 gpm on>>>>works great from the command line, not in extend-compute.xml. Thanks for >>the new tool though, always glad. The line above is in a block without >><eval shell="bash"> tags. I'll keep trying and rtm. Is it possible this >>is a 2.6.2 issue? The live environment restricts me from using a more >>recent version.>>>>>>>From: "Mason J. Katz" <mjk at sdsc.edu>>>>To: "Reed Scarce" <junkscarce at hotmail.com>>>>CC: npaci-rocks-discussion at sdsc.edu>>>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails>>>Date: Tue, 23 Dec 2003 16:35:13 -0800>>>>>>"man chkconfig">>>>>>If you use chkconfig you do not need to create the rc*.d/* files and they >>>are put in place for you.>>>>>> -mjk>>>>>>On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:>>>>>>>Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml >>>>lies code like this commented code:

Page 274: 2003 December

>>>><post>>>>>/bin/mkdir /mnt/plc/ <-- works -->>>>>/bin/mkdir /mnt/plc/plc_data <-- works -->>>>>/bin/ln -s /mnt/plc_data /data1 <-- works -->>>>>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, >>>>source exists -->>>>></post>>>>>>>>>I don't understand why the ln to a directory succeeds but a ln to a >>>>script fails.>>>>>>>>BTW, Dr. Landman, I've attempted to use your build.pl but it seems to >>>>faill with:>>>>Can't stat >>>>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .>>>>(my note: the path ends at RPMS) I swear I thought I saw a solution to >>>>this once but I can't find it again.>>>>Upon reinstallation with the file your tool created >>>>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda >>>>threw back the exception: Traceback (innermost last): file >>>>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, >>>>configFileData) File >>>>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427 >>>>in run>>>>ok save debug>>>>>>>>>>>>TIA Reed Scarce>>>>>>>>_________________________________________________________________>>>>Tired of slow downloads? Compare online deals from your local high-speed >>>>providers now. https://broadband.msn.com>>>>>>>_________________________________________________________________>>Worried about inbox overload? Get MSN Extra Storage now! >>http://join.msn.com/?PAGE=features/es>

_________________________________________________________________Make your home warm and cozy this winter with tips from MSN House & Home. http://special.msn.com/home/warmhome.armx

From dlane at ap.stmarys.ca Mon Dec 29 15:44:23 2003From: dlane at ap.stmarys.ca (Dave Lane)Date: Mon, 29 Dec 2003 19:44:23 -0400Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

At 11:15 PM 12/29/2003 +0000, Reed Scarce wrote:>Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that work?

Reed,

Below is a script that worked fine for me (with 2.3.2). What it does should be fairly explanatory...Dave

Page 275: 2003 December

--->>>

<post> <!-- Insert your post installation script here. This code will be executed on the destination node after the packages have been installed. Typically configuration files are built and services setup in this section. -->

mv /usr/local /usr/local-oldln -s /home/local /usr/localln -s /home/opt/intel /opt/intelln -s /home/disc15 /disc15mkdir /scratch/tmpchmod 1777 /scratch/tmpecho '#!/bin/bash' > /etc/init.d/waitecho 'sleep 60' >> /etc/init.d/waitchmod +x /etc/init.d/waitln -s /etc/init.d/wait /etc/rc3.d/S11waitln -s /etc/init.d/wait /etc/rc4.d/S11waitln -s /etc/init.d/wait /etc/rc5.d/S11wait

<eval sh="python"> <!-- This is python code that will be executed on the frontend node during kickstart generation. You may contact the database, make network queries, etc. These sections are generally used to help build more complex configuration files. The 'sh' attribute may point to any language interpreter such as "bash", "perl", "ruby", etc. --> </eval></post>

From bruno at rocksclusters.org Mon Dec 29 19:03:25 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 29 Dec 2003 19:03:25 -0800Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> Pulled the distro. Burned it after checking md5's. Ok. > Booted/installed test> cluster, completely vanilla, just defaults.

i'm assuming this is an x86 installation, yes?

> SSH is too slow. Wow. 5-10 seconds to log in.

that is not the case on our clusters. in fact, we tested this on all three architectures and all three are 'fast'.

> Ok, now out at a customer site with the disks.>> Unhappily discovered that the following are missing:>

Page 276: 2003 December

> a) md (e.g. Software RAID): Just try to build one. Anaconda will > happily let> you do this ... though it will die in the formatting stages. Dropping > into the> shell (Alt-F2) and looking for the md module (lsmod) shows nothing. > Insmod the> md also doesn't do anything. Catting /proc/devices shows no md as a > character> or block device.>> If md is really not there anymore, it should be removed from anaconda, > just like ...>> b) ext3. There is no ext3 available for the install.>> Also discovered how incredibly fragile anaconda is. In order to > install, you> have to wipe the disks. It will not install if there is an md > (software raid)> device, chosing instead to crap out after you have entered in all the> information. To say that this is annoying is a slight understatement. > This is> an anaconda issue, not a ROCKS issue, though as a result of this > issue, ROCKS is> less functional than it could be.

we'll look into the above two issues.

> I also noted that there is no xfs option. This means that I will need > to hack> new kernels later on after the install.

just curious, is xfs offered as an option on other redhat supported products?

also (and i'm assuming this will be no consolation to you, but it may be to others), building a new kernel RPM is straightforward in rocks:

http://www.rocksclusters.org/rocks-documentation/3.1.0/customization- kernel.html

- gb

From landman at scalableinformatics.com Mon Dec 29 19:44:16 2003From: landman at scalableinformatics.com (Joe Landman)Date: Mon, 29 Dec 2003 22:44:16 -0500Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]>

<[email protected]>Message-ID: <[email protected]>

On Mon, 2003-12-29 at 22:03, Greg Bruno wrote:> > Pulled the distro. Burned it after checking md5's. Ok. > > Booted/installed test> > cluster, completely vanilla, just defaults.>

Page 277: 2003 December

> i'm assuming this is an x86 installation, yes?

Yes.

> > > SSH is too slow. Wow. 5-10 seconds to log in.> > that is not the case on our clusters. in fact, we tested this on all > three architectures and all three are 'fast'.

2 different clusters exhibited the same results. Fixed one by applyingdnsmasq to one of them.

> > > Ok, now out at a customer site with the disks.> >> > Unhappily discovered that the following are missing:> >> > a) md (e.g. Software RAID): Just try to build one. Anaconda will > > happily let> > you do this ... though it will die in the formatting stages. Dropping > > into the> > shell (Alt-F2) and looking for the md module (lsmod) shows nothing. > > Insmod the> > md also doesn't do anything. Catting /proc/devices shows no md as a > > character> > or block device.> >> > If md is really not there anymore, it should be removed from anaconda, > > just like ...> >> > b) ext3. There is no ext3 available for the install.> >> > Also discovered how incredibly fragile anaconda is. In order to > > install, you> > have to wipe the disks. It will not install if there is an md > > (software raid)> > device, chosing instead to crap out after you have entered in all the> > information. To say that this is annoying is a slight understatement. > > This is> > an anaconda issue, not a ROCKS issue, though as a result of this > > issue, ROCKS is> > less functional than it could be.> > we'll look into the above two issues.

Thanks

> > > I also noted that there is no xfs option. This means that I will need > > to hack> > new kernels later on after the install.> > just curious, is xfs offered as an option on other redhat supported > products?

Nope, nor will Redhat likely do this in the near/mid term. This isfairly common knowledge. All the other major distros do offer Redhat. I hope that the defense of the current state isn't that "Redhat doesn't

Page 278: 2003 December

support it". I might have misunderstood you, but Redhat is almostcompletely disinterested in clusters, so Redhat supporting/notsupporting it is really not relevant.

Curiously, cAos which is doing some of the similar things ROCKS is doingin terms of recompiling packages sans Redhat trademarks, has XFS and anumber of other useful things in there.

Regardless, having ext2 or vfat as your only fs options simply is notreasonable, as both neither of these are really appropriate for verylarge disks, or big file systems.

> > also (and i'm assuming this will be no consolation to you, but it may > be to others), building a new kernel RPM is straightforward in rocks:> > http://www.rocksclusters.org/rocks-documentation/3.1.0/customization-> kernel.html

I had been planning to use a similar approach to this. I was/am simplyquite surprised that the two options for ROCKS file systems are reallynot very good, and the good choices are unavailable. In all fairnessthis is more likely a constraint of anaconda than of ROCKS.

I fixed the ext2/ext3 by a reboot after a quick tune2fs session and somefixup of the /etc/fstab.

I have to say that I get less and less impressed with anaconda as timegoes on.

I fixed the partitioning problem (anaconda dies when it runs in an md'edset of partitions) by wiping the disk and using knoppix to fdisk thedisks. Autopartitioning is not an option, as the default choices arenot all that good (another anaconda-ism).

> > - gb

From cdwan at mail.ahc.umn.edu Mon Dec 29 20:58:20 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Mon, 29 Dec 2003 22:58:20 -0600 (CST)Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

I also encountered the Software RAID problem today. It made upgrading anexisting ROCKS cluster a little tricky.

Another behavior I noticed was that the CDs were not ejecting as the nodeinstalls finished. It was managable, but required watching to prevent the

Page 279: 2003 December

endless reinstall cycle.

-Chris Dwan

From bruno at rocksclusters.org Mon Dec 29 21:48:22 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 29 Dec 2003 21:48:22 -0800Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

> Another behavior I noticed was that the CDs were not ejecting as the > node> installs finished. It was managable, but required watching to prevent > the> endless reinstall cycle.

actually, it isn't a problem as the last CD in the frontend will be a roll and rolls are not bootable.

- gb

From cdwan at mail.ahc.umn.edu Mon Dec 29 21:51:13 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Mon, 29 Dec 2003 23:51:13 -0600 (CST)Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

> > Another behavior I noticed was that the CDs were not ejecting as the> > node> > installs finished. It was managable, but required watching to prevent> > the> > endless reinstall cycle.>> actually, it isn't a problem as the last CD in the frontend will be a> roll and rolls are not bootable.

You're right about the frontend. It was the compute nodes where it gaveme trouble. Roll disks never go in those.

-Chris Dwan

From landman at scalableinformatics.com Mon Dec 29 22:03:06 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 30 Dec 2003 01:03:06 -0500

Page 280: 2003 December

Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]>

<[email protected]> <[email protected]> <[email protected]> <[email protected]>

Message-ID: <[email protected]>

What I had noticed is that some CD hardware does not eject whenprompting for swapping in the roll. I swapped hardware and that fixedit. Rather odd. Seen this in 3 different systems. Worked ok withprevious ROCKS.

Is it possible to do something like a

frontend askmethod

akin to the "linux askmethod" and specifically have the ISO's online ina directory somewhere? Just curious... I find it interesting that 10years after swapping floppies for OS installs, I am now swapping CDs... There is irony here somewhere.

On Tue, 2003-12-30 at 00:48, Greg Bruno wrote:> > Another behavior I noticed was that the CDs were not ejecting as the > > node> > installs finished. It was managable, but required watching to prevent > > the> > endless reinstall cycle.> > actually, it isn't a problem as the last CD in the frontend will be a > roll and rolls are not bootable.> > - gb

From bruno at rocksclusters.org Mon Dec 29 22:28:45 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 29 Dec 2003 22:28:45 -0800Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

> Is it possible to do something like a>> frontend askmethod>> akin to the "linux askmethod" and specifically have the ISO's online in> a directory somewhere? Just curious...

the ability to install frontends remotely is at the top of our priority list for the next release.

Page 281: 2003 December

> I find it interesting that 10> years after swapping floppies for OS installs, I am now swapping CDs...> There is irony here somewhere.

sorry, i'm going to have to evangelize rolls a bit.

joe, do you not have just a bit of appreciation for rolls and what is going on under the sheets? we now have a formal way for you, that's right you, to augment the installation of a cluster. you get to programmatically interact with the installer at virtually any level. you get to tell the installer what bits you want it to lay down and how to configure them. and this is done completely independently of the core. the core has no idea of your bits, yet, it installs it and configures it to your specification.

for you, this could be having the 'scalable informatics' roll that contains all your RPMS and XML configuration files. this ISO image could be completely proprietary, yet, the installer installs it. you could ship your roll worldwide and every one of your customers would, within 2 hours, have a scalable informatics cluster online running the applications you sold them. and, you know it would be running because you embedded the correct configuration into the roll.

or, perhaps rolls work so smoothly, it just looks like CD swapping. :-)

- gb

From landman at scalableinformatics.com Mon Dec 29 22:50:30 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 30 Dec 2003 01:50:30 -0500Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]>

<[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>

Message-ID: <[email protected]>

On Tue, 2003-12-30 at 01:28, Greg Bruno wrote:

> > There is irony here somewhere.> > sorry, i'm going to have to evangelize rolls a bit.> > joe, do you not have just a bit of appreciation for rolls and what is > going on under the sheets? we now have a formal way for you, that's > right you, to augment the installation of a cluster. you get to > programmatically interact with the installer at virtually any level. > you get to tell the installer what bits you want it to lay down and how > to configure them. and this is done completely independently of the > core. the core has no idea of your bits, yet, it installs it and > configures it to your specification.

Page 282: 2003 December

Actually I do have a pretty good appreciation for them. I see that theyare a different way of solving the problems I have been solving for awhile using "other methods"(http://scalableinformatics.com/downloads/finishing/finishing-v3.1.0.tar.gz). What I don't see is how to build them (yes, I did see the "source" messages, and "cvs", ...).

The major issue for me is going to be anaconda, all its joy and bugs,and what directions its use forces ROCKS to follow (vis-a-vis filesystems, etc).

> > for you, this could be having the 'scalable informatics' roll that > contains all your RPMS and XML configuration files. this ISO image > could be completely proprietary, yet, the installer installs it. you > could ship your roll worldwide and every one of your customers would, > within 2 hours, have a scalable informatics cluster online running the > applications you sold them. and, you know it would be running because > you embedded the correct configuration into the roll.

This is a nice vision, though it is unfortunately a vision. Thecustomer would have to re-install the cluster head node when a newversion of the bits comes out. Right? This is simply not tenable for aproduction cycle facility that needs to upgrade a package. Please letme know if my understanding is incorrect, I would be quite happy to hearthis.

The "other method" that I developed doesn't have this as a problem. Just re-install the compute nodes, and load the RPM on the head nodes. In fact I built some tools which simplify both the "other method" andthe ROCKS method. As I have to worry about multiple different clusterdistros (not just ROCKS, sorry, customers get what they need/want), Ihave to worry about interfacing with that distro. So I have some tools(the auto-build scripts) which simplify adding/removing packages intothe extend-compute.xml.

What I am hoping for rolls are two things: 1) insertable/removable froma live cluster without forcing a re-install of the head node (computenodes, thats fine, not the head nodes) 2) simple documentation on how tobuild. If they are really quite simple, I see no reason I could nottake the same tool I use to automate the building of installable RPMSfor the other method actually emit a ROCKS roll. But I need to know howto do this. I am not sure I have sufficient time to "read the source,Luke" for this. I would be happy to do this given time, and customerdemand/need. The other method had that, hence its development.

> > or, perhaps rolls work so smoothly, it just looks like CD swapping. :-)

My point was that after inserting the SGE roll, I had to get up from theconsole, walk over to the unit, swap in the next roll, iterate....

Felt like CD swapping to me.

Rolls wont solve other problems which are anaconda specific (filesystems, partitioning, formatting, RAID, network detection, etc). Asthere are multiple similar RHEL de-redhatifying efforts, some of whichare drastically improving the installation process (by not usinganaconda), are you folks looking to move away from anaconda any time

Page 283: 2003 December

soon?

> > - gb--

From bruno at rocksclusters.org Mon Dec 29 23:45:52 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Mon, 29 Dec 2003 23:45:52 -0800Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

> This is a nice vision, though it is unfortunately a vision. The> customer would have to re-install the cluster head node when a new> version of the bits comes out. Right? This is simply not tenable for a> production cycle facility that needs to upgrade a package. Please let> me know if my understanding is incorrect, I would be quite happy to > hear> this.

we've talked about this on the list and we've talked with you about this in person. you know the above statement is true. you also know it is a future direction for rolls.

> What I am hoping for rolls are two things: 1) insertable/removable from> a live cluster without forcing a re-install of the head node (compute> nodes, thats fine, not the head nodes) 2) simple documentation on how > to> build. If they are really quite simple, I see no reason I could not> take the same tool I use to automate the building of installable RPMS> for the other method actually emit a ROCKS roll. But I need to know > how> to do this. I am not sure I have sufficient time to "read the source,> Luke" for this. I would be happy to do this given time, and customer> demand/need. The other method had that, hence its development.

a roll developer's guide is in progress. and, as stated above, adding rolls to a live frontend is on our roadmap.

> Rolls wont solve other problems which are anaconda specific (file> systems, partitioning, formatting, RAID, network detection, etc).

not true. if you wish to get deeply involved with the red hat installer, you can develop a 'patch' roll that will change the installer to do as you wish.

> As> there are multiple similar RHEL de-redhatifying efforts, some of which

Page 284: 2003 December

> are drastically improving the installation process (by not using> anaconda), are you folks looking to move away from anaconda any time> soon?

please educate us -- where can we download these installers and find the developer guides that describe how to interact with the installer.

as for moving away from anaconda, i don't think that will happen anytime soon. anaconda has served us well. we have all had issues with the installer, but i would rather work with anaconda rather than reinvent it. the boys and girls at redhat have a vested interest in detecting and configuring the latest hardware and i plan on leveraging that.

of the issues you mention above, the only one we don't know how to control yet is file system selection (but, we will look into it per your earlier request). we already manipulate anaconda to partition and format the drives to our specifications, and we have ideas on how to handle RAID and network naming (which is what i think you mean by network detection).

- gb

From landman at scalableinformatics.com Tue Dec 30 00:55:37 2003From: landman at scalableinformatics.com (Joe Landman)Date: Tue, 30 Dec 2003 03:55:37 -0500Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]>

<[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>

Message-ID: <[email protected]>

On Tue, 2003-12-30 at 02:45, Greg Bruno wrote:> > This is a nice vision, though it is unfortunately a vision. The> > customer would have to re-install the cluster head node when a new> > version of the bits comes out. Right? This is simply not tenable for a> > production cycle facility that needs to upgrade a package. Please let> > me know if my understanding is incorrect, I would be quite happy to > > hear> > this.> > we've talked about this on the list and we've talked with you about > this in person. you know the above statement is true. you also know it > is a future direction for rolls.

I was simply responding to the evangelism which seemed to imply thefunctionality existed today. It doesn't, and we both agree that it isnecessary. Although the vision will provide innumerable benefits ...ROCKS is not there yet, and won't be for a while.

Page 285: 2003 December

Thats ok though, as I have a reasonable work around for some of theseissues. And when I can insert and delete rolls live into a cluster,I'll modify my tools to emit rolls. Until then, it is as you said, avision for the future.

[...]

> a roll developer's guide is in progress. and, as stated above, adding > rolls to a live frontend is on our roadmap.

Adding and removing are needed as we have discussed.

> > > Rolls wont solve other problems which are anaconda specific (file> > systems, partitioning, formatting, RAID, network detection, etc).> > not true. if you wish to get deeply involved with the red hat > installer, you can develop a 'patch' roll that will change the > installer to do as you wish.

I guess I am at a loss to understand what it is you are doing then. Ifyou are telling me I can hack around anaconda to my hearts content, whydo you tell me later on that ROCKS is deeply wedded to anaconda and willnot change soon? I will assume I am missing something here. Can Ireplace anaconda? This is what I think you are saying. If you areinstead saying, no don't replace, just hack it, I am not sure I want todo that. It is a very large and complex beast, with one system doingthe job of many. Jack of all trades.

More than half of the pain I have experienced deploying ROCKS isdirectly attributable to anaconda. I would like to work around it. IfI can completely replace it under ROCKS this could be of interest. If Icannot, and ROCKS will always remain closely tied to RedHat specifictechnology (e.g. anaconda), that is also important to know.

> > > As> > there are multiple similar RHEL de-redhatifying efforts, some of which> > are drastically improving the installation process (by not using> > anaconda), are you folks looking to move away from anaconda any time> > soon?> > please educate us -- where can we download these installers and find > the developer guides that describe how to interact with the installer.

If you are serious about this, I would be happy to help you find moredevelopment info and help make introductions to some of the people doingthis stuff. If you are not serious about this, thats fine too.

> as for moving away from anaconda, i don't think that will happen > anytime soon. anaconda has served us well. we have all had issues with > the installer, but i would rather work with anaconda rather than > reinvent it. the boys and girls at redhat have a vested interest in > detecting and configuring the latest hardware and i plan on leveraging > that.

Knoppix makes good use of the anaconda detection routines without usinganaconda. You do not need anaconda in its entirety for the detectionroutines.

Page 286: 2003 December

While Redhat has a vested interest in making sure it detects hardwarewell, the software that does it's installation has been getting more andmore fragile compared to other installation systems. Simple failures ofone item or the other in the SUSE YAST tool, or the Mandrake installer,or for that matter, most of the non-anaconda based installers do notforce you to start over from the beginning. Stack traces are not given,and you are not asked to debug an arcane and complex python program froma highly limited command window. You are brought back to a well knownand well defined state, and you have a finite and non zero chance ofrecovering from the failure. This is different than the anacondaexperience, where the slightest hiccup, which would be triviallycorrectable given the opportunity, results in a complete failure of theprocess.

This has resulted in our discovery of the RH9/RHEL fragility andsensitivity (and lack of ability) to software raid, partitioning, andrelated. This has wasted many hours of our collective time, and theinability to use the upgrade option for those of us with software RAIDsystems.

As ROCKS depends critically upon this bit of technology that youindicate later on is so important, ROCKS happens to share in itspitfalls, even though these are not ROCKS problems. I am not sure ifyou understand how much time I have to spend explaining to customers andend users why what they are seeing are not ROCKS problems but a Redhatartifacts. Part of the reason I am raising this issue in this forum isthat I have spent all together too much time trying to explain this tovarious users.

> of the issues you mention above, the only one we don't know how to > control yet is file system selection (but, we will look into it per > your earlier request). we already manipulate anaconda to partition and > format the drives to our specifications, and we have ideas on how to > handle RAID and network naming (which is what i think you mean by > network detection).

Network detection is

a) getting the right network driver config 1) by detection2) from floppy/usb/whatever

b) getting the correct network interface ordering (what you call naming)

The point you (somewhat whimsicality) made was that I could createScalable Informatics rolls and ship them around the world for people touse in 2 hours. Great. Good vision, and that is something like what Iam looking at. I have that now with my tools, but I can always expandtheir functionality.

Now the problem is, if after shipping out my roll, when my end usersinstall it, anaconda barfs in some new and exciting manner (has happenedalready with the finishing scripts, and I have worked hard to try tofigure out what is broken in anaconda to work around its bugs), who arethe customers going to blame?

My experience thus far is that ROCKS is taking more than its fair shareof heat over bugs that it has nothing to do with.

Page 287: 2003 December

From fds at sdsc.edu Tue Dec 30 05:53:48 2003From: fds at sdsc.edu (fds at sdsc.edu)Date: Tue, 30 Dec 2003 05:53:48 -0800 (PST)Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Code in the <post> section of an xml file (extend-compute or otherwise)can be almost anything. When the script is run, the environment is not asfull as usual, which is why we always recommend specifying the full pathto commands. As you saw, /bin and /usr/bin are in the path, so certainthings like "which sed" will work, for example.

Remember that everything in the eval tags gets run at kickstart generationtime (on the frontend). Everything else (the naked commands in the postsection) are run by the node being installed.

We do intend for the heart of the customization to be performed atkickstart time. I would be suprised if you had to postpone many tasksuntil the node was up, although this does happen occasionally. The globusand condor post configuration contain tasks that cannot be done at installtime.

Send us the scripts in question and we will take a look.

-Federico

> Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that> work?> I need to know the limitations of the distribution. As far as I can tell> the commands are available (`which command`locates the commands fine) but> they don't necessarily perform the job as expected. I had seen the> `eval...` clairification in the archives.>> As it stands I plan to mkdir, ln and echo in the extend-c... but then run> the heart of the customization (scripted) once the nodes are up. It just> doesn't seem to be what was intended.>> As always, thanks for your help> --Reed>

From purikk at hotmail.com Tue Dec 30 06:03:02 2003From: purikk at hotmail.com (Purushotham Komaravolu)Date: Tue, 30 Dec 2003 09:03:02 -0500Subject: [Rocks-Discuss]LicensingReferences: <[email protected]>Message-ID: <[email protected]>

Hi All, I would like to know the list of the components that have to be

Page 288: 2003 December

licensed, when we install ROCKS as a commercial solution.ThanksHappy HolidaysPuru

From doug at seismo.berkeley.edu Tue Dec 30 10:53:36 2003From: doug at seismo.berkeley.edu (Doug Neuhauser)Date: Tue, 30 Dec 2003 10:53:36 -0800 (PST)Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsMessage-ID: <[email protected]>

I am having a problem upgrading Rocks 2.3.2 to 3.1.0.Both my head node and compute nodes are dual XEON 2.4 GHz boxes.

We burned the CDs from the following images:rocks-base-3.1.0.i386.isoroll-hpc-3.1.0-0.i386.isoroll-grid-3.1.0-0.any.isoroll-intel-3.1.0-0.any.isoroll-sge-3.1.0-0.any.iso

I verified the md5s both on the downloaded images from the rocksweb site and the md5s on the burned cds. They are fine.I have run the upgrade several times -- at least once with all of therolls, and once with nust the rocks base and hpc roll.

I head node installs with no problem using the commandfrontend upgrade

I can login and run insert-ethers, telling it to look for compute nodes.

When I power on a compute node, it boots grub, selects the onlykernel on its local disk

Rocks Reinstalland runs through the /sbin/loader.The blue screen comes up, the compute node requests and receives a dynamic IP address from the head node, but then within a few seconds aborts with the messages:

install exited abnormally - received signal 11sending termination signals ... donesending kill signals ... donedisabling swap ...unmounting filesystems .../proc/bus/usb done/proc done/dev/pts doneYou may safely reboot your system

It appears the the "Rocks Reinstall" kernel on the disk is not compatiblewith Rocks 3.1.0. When I changed the compute node boot order to perform a PXE boot before the hard disks, it properly downloads the 3.1.0 kernelfrom the head node, reformats the disk, and installes 3.1.0 properly.I have to catch it in the reboot, and change the boot order to use thedisk before PXE, or I get into an infinite loop.

Is there any better way to address this problem? The procedure of:set PXE boot firstboot from net, install rocks 3.1.0 on diskrebootcatch node during reboot, change boot order to floppy,disk,net

Page 289: 2003 December

rebootfor each node is tedious.

Did I do something wrong in how I shut my 2.3.2 cluster down before theupgrade? If so, some notes about this in the install instructions wouldbe useful.

- Doug N

------------------------------------------------------------------------Doug Neuhauser University of California, Berkeleydoug at seismo.berkeley.edu Berkeley Seismological LaboratoryPhone: 510-642-0931 215 McCone Hall # 4760Fax: 510-643-5811 Berkeley, CA 94720-4760

From bruno at rocksclusters.org Tue Dec 30 11:29:14 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 30 Dec 2003 11:29:14 -0800Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

On Dec 30, 2003, at 10:53 AM, Doug Neuhauser wrote:

> I am having a problem upgrading Rocks 2.3.2 to 3.1.0.> Both my head node and compute nodes are dual XEON 2.4 GHz boxes.>> We burned the CDs from the following images:> rocks-base-3.1.0.i386.iso> roll-hpc-3.1.0-0.i386.iso> roll-grid-3.1.0-0.any.iso> roll-intel-3.1.0-0.any.iso> roll-sge-3.1.0-0.any.iso> I verified the md5s both on the downloaded images from the rocks> web site and the md5s on the burned cds. They are fine.> I have run the upgrade several times -- at least once with all of the> rolls, and once with nust the rocks base and hpc roll.>> I head node installs with no problem using the command> frontend upgrade> I can login and run insert-ethers, telling it to look for compute > nodes.>> When I power on a compute node, it boots grub, selects the only> kernel on its local disk> Rocks Reinstall> and runs through the /sbin/loader.> The blue screen comes up, the compute node requests and receives a> dynamic IP address from the head node, but then within a few seconds> aborts with the messages:> install exited abnormally - received signal 11> sending termination signals ... done> sending kill signals ... done> disabling swap ...> unmounting filesystems ...> /proc/bus/usb done

Page 290: 2003 December

> /proc done> /dev/pts done> You may safely reboot your system> > It appears the the "Rocks Reinstall" kernel on the disk is not > compatible> with Rocks 3.1.0. When I changed the compute node boot order to > perform> a PXE boot before the hard disks, it properly downloads the 3.1.0 > kernel> from the head node, reformats the disk, and installes 3.1.0 properly.> I have to catch it in the reboot, and change the boot order to use the> disk before PXE, or I get into an infinite loop.>> Is there any better way to address this problem? The procedure of:> set PXE boot first> boot from net, install rocks 3.1.0 on disk> reboot> catch node during reboot, change boot order to floppy,disk,net> reboot> for each node is tedious.>> Did I do something wrong in how I shut my 2.3.2 cluster down before the> upgrade? If so, some notes about this in the install instructions > would> be useful.

your right, the 2.3.2 installer (anaconda from redhat's version 7.3) is not compatible with the installer on rocks 3.1 (anaconda from redhat's enterprise linux 3.0).

the way you will have to reinstall your cluster is one of two ways:

1) if your compute nodes support PXE that is enabled from the keyboard -- that is, when you boot the node, in BIOS you see a message that says "Press F12 for Network Boot (PXE)". if your nodes have that, then you'll have to boot the nodes, one by one and, when you see the message, press the F12 key, then move to the next node.

2) use the rocks base CD to boot each compute node. when insert-ethers reports that it discovered the node, take the CD out and put it in the next compute node.

but, if your compute nodes were initially installed with PXE, the fastest way to upgrade the compute nodes is to simply turn all the compute nodes off, upgrade the frontend, run insert-ethers, then turn the compute nodes on one by one. the compute nodes should be set for PXE boot which will pull the installer from the frontend and therefore be updated installer.

as you state above, we need to document this.

thanks for the bug report.

- gb

Page 291: 2003 December

From doug at seismo.berkeley.edu Tue Dec 30 11:45:59 2003From: doug at seismo.berkeley.edu (Doug Neuhauser)Date: Tue, 30 Dec 2003 11:45:59 -0800 (PST)Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsMessage-ID: <[email protected]>

Greg,

1. I don't have cdroms on my compute nodes, only floppy. :(2. My boot order on the compute nodes is normally:

floppy, disk, PXE3. I don't have a hot-key override to force PXE boot. I have to change the BIOS boot order to enable PXE boot.

> but, if your compute nodes were initially installed with PXE, the > fastest way to upgrade the compute nodes is to simply turn all the > compute nodes off, upgrade the frontend, run insert-ethers, then turn > the compute nodes on one by one. the compute nodes should be set for > PXE boot which will pull the installer from the frontend and therefore > be updated installer.

I don't understand this.

I can't leave the compute nodes with PXE boot first, or it will create an endless loop. The compute node will boot via PXE, install rocks 3.1.0,and then reboot via PXE and repeat the process ad-nauseum.

Can I use the old floppy boot image found at:ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img

to force a network boot?

The 3.1.0 online manual has a link in the section1.3 Install your Compute Nodes

to ftp://www.rocksclusters.org/pub/rocks/bootnet.imgbut this does not exist.

- Doug N------------------------------------------------------------------------Doug Neuhauser University of California, Berkeleydoug at seismo.berkeley.edu Berkeley Seismological LaboratoryPhone: 510-642-0931 215 McCone Hall # 4760Fax: 510-643-5811 Berkeley, CA 94720-4760

From junkscarce at hotmail.com Tue Dec 30 11:57:16 2003From: junkscarce at hotmail.com (Reed Scarce)Date: Tue, 30 Dec 2003 19:57:16 +0000Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation failsMessage-ID: <[email protected]>

I tested your echo ... wait and ln wait... S11wait lines. They worked perfectly. Then I tried the same with gpm and left wait in the script. Wait worked as before, and gpm didn't work - like before. I've given up on doing anything very fancy and have started to make a script to run the first time it boots, with hand removal.

Thanks for the perspective,--Reed

Page 292: 2003 December

>From: Dave Lane <dlane at ap.stmarys.ca>>To: "Reed Scarce" <junkscarce at hotmail.com>>CC: npaci-rocks-discussion at sdsc.edu>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails>Date: Mon, 29 Dec 2003 19:44:23 -0400>>At 11:15 PM 12/29/2003 +0000, Reed Scarce wrote:>>Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that >>work?>>Reed,>>Below is a script that worked fine for me (with 2.3.2). What it does should >be fairly explanatory...Dave>>--->>>>><post>> <!-- Insert your post installation script here. This> code will be executed on the destination node after the> packages have been installed. Typically configuration files> are built and services setup in this section. -->>>mv /usr/local /usr/local-old>ln -s /home/local /usr/local>ln -s /home/opt/intel /opt/intel>ln -s /home/disc15 /disc15>mkdir /scratch/tmp>chmod 1777 /scratch/tmp>echo '#!/bin/bash' > /etc/init.d/wait>echo 'sleep 60' >> /etc/init.d/wait>chmod +x /etc/init.d/wait>ln -s /etc/init.d/wait /etc/rc3.d/S11wait>ln -s /etc/init.d/wait /etc/rc4.d/S11wait>ln -s /etc/init.d/wait /etc/rc5.d/S11wait>> <eval sh="python">> <!-- This is python code that will be executed on> the frontend node during kickstart generation. You> may contact the database, make network queries, etc.> These sections are generally used to help build> more complex configuration files.> The 'sh' attribute may point to any language interpreter> such as "bash", "perl", "ruby", etc.> -->> </eval>></post>>

_________________________________________________________________Get reliable dial-up Internet access now with our limited-time introductory offer. http://join.msn.com/?page=dept/dialup

From landman at scalableinformatics.com Tue Dec 30 12:01:44 2003From: landman at scalableinformatics.com (Joe Landman)

Page 293: 2003 December

Date: Tue, 30 Dec 2003 15:01:44 -0500Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

Hi Doug:

As long as pxe is in there, you should be able to do this(semi)-automatically. All you need to do is to wipe the partitiontables and boot sectors of the compute nodes. I seem to remember areally simply single floppy that did this.

See http://paud.sourceforge.net/ and http://dban.sourceforge.net/

I think dban is the right one. After that (only on compute nodes) youshould be able to pxe boot.

Joe

On Tue, 2003-12-30 at 14:45, Doug Neuhauser wrote:> Greg,> > 1. I don't have cdroms on my compute nodes, only floppy. :(> 2. My boot order on the compute nodes is normally:> floppy, disk, PXE> 3. I don't have a hot-key override to force PXE boot. > I have to change the BIOS boot order to enable PXE boot.> > > but, if your compute nodes were initially installed with PXE, the > > fastest way to upgrade the compute nodes is to simply turn all the > > compute nodes off, upgrade the frontend, run insert-ethers, then turn > > the compute nodes on one by one. the compute nodes should be set for > > PXE boot which will pull the installer from the frontend and therefore > > be updated installer. > > I don't understand this.> > I can't leave the compute nodes with PXE boot first, or it will create an > endless loop. The compute node will boot via PXE, install rocks 3.1.0,> and then reboot via PXE and repeat the process ad-nauseum.> > Can I use the old floppy boot image found at:> ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img> to force a network boot?> > The 3.1.0 online manual has a link in the section> 1.3 Install your Compute Nodes> to ftp://www.rocksclusters.org/pub/rocks/bootnet.img> but this does not exist.> > - Doug N> ------------------------------------------------------------------------> Doug Neuhauser University of California, Berkeley> doug at seismo.berkeley.edu Berkeley Seismological Laboratory> Phone: 510-642-0931 215 McCone Hall # 4760> Fax: 510-643-5811 Berkeley, CA 94720-4760

Page 294: 2003 December

From bruno at rocksclusters.org Tue Dec 30 12:07:34 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 30 Dec 2003 12:07:34 -0800Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

On Dec 30, 2003, at 11:45 AM, Doug Neuhauser wrote:

> Greg,>> 1. I don't have cdroms on my compute nodes, only floppy. :(> 2. My boot order on the compute nodes is normally:> floppy, disk, PXE> 3. I don't have a hot-key override to force PXE boot.> I have to change the BIOS boot order to enable PXE boot.>>> but, if your compute nodes were initially installed with PXE, the>> fastest way to upgrade the compute nodes is to simply turn all the>> compute nodes off, upgrade the frontend, run insert-ethers, then turn>> the compute nodes on one by one. the compute nodes should be set for>> PXE boot which will pull the installer from the frontend and therefore>> be updated installer.>> I don't understand this.

i'll try to a better explanation.

when compute nodes are installed via PXE, rocks detects this and manipulates the boot sector of the disk drive on the compute node that makes the disk non-bootable. that way, if the compute node is reset, it will try to PXE boot. it will PXE boot even if your boot order is: hard disk, cd/floppy, PXE. this occurs because the hard disk is non-bootable so the BIOS boot loader will skip the hard disk and move on to the other boot devices.

> I can't leave the compute nodes with PXE boot first, or it will create > an> endless loop. The compute node will boot via PXE, install rocks 3.1.0,> and then reboot via PXE and repeat the process ad-nauseum.>> Can I use the old floppy boot image found at:> ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img> to force a network boot?>> The 3.1.0 online manual has a link in the section> 1.3 Install your Compute Nodes> to ftp://www.rocksclusters.org/pub/rocks/bootnet.img> but this does not exist.

we are no longer supporting the boot floppy as it was problematic to make one that contained the appropriate device drivers that worked on most compute nodes.

- gb

Page 295: 2003 December

From doug at seismo.berkeley.edu Tue Dec 30 12:28:46 2003From: doug at seismo.berkeley.edu (Doug Neuhauser)Date: Tue, 30 Dec 2003 12:28:46 -0800 (PST)Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsMessage-ID: <[email protected]>

Greg,

Thanks for the detailed boot/reboot explaination. My problem datesback to my intitial rocks 2.3.2 installation. My compute nodemotherboards have 3 ethernet interfaces (1 100Mb, 2 1Gb), but initiallyonly the 100 Mb supported PXE. When I used that for PXE boot, Linuxwould then remap the interfaces so that it tried to use one of the Gbitinterfaces on the next reboot. Needless to say, the head node did notrespond to DHCP because the MAC address was unknown to it.

My solution was to get a new BIOS from Tyan that supported PXE onall interfaces. However, since my cluster was initially installed usingthe boot floppy, my compute nodes have the vestiges of floppy boot config,not PXE boot config.

I'll try Joe Landman's suggestion of a scrub floppy to scrub the bootsector of the boot disk on the compute nodes. If I can't do that, ICAN go through the manual process of setting and resetting the bootorder on each compute node, but it is a slow and sequential process.

- Doug N ------------------------------------------------------------------------Doug Neuhauser University of California, Berkeleydoug at seismo.berkeley.edu Berkeley Seismological LaboratoryPhone: 510-642-0931 215 McCone Hall # 4760Fax: 510-643-5811 Berkeley, CA 94720-4760

From sjenks at uci.edu Tue Dec 30 12:37:26 2003From: sjenks at uci.edu (Stephen Jenks)Date: Tue, 30 Dec 2003 12:37:26 -0800Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote:> when compute nodes are installed via PXE, rocks detects this and > manipulates the boot sector of the disk drive on the compute node that > makes the disk non-bootable. that way, if the compute node is reset, > it will try to PXE boot. it will PXE boot even if your boot order is: > hard disk, cd/floppy, PXE. this occurs because the hard disk is > non-bootable so the BIOS boot loader will skip the hard disk and move > on to the other boot devices.

Hi Greg, et al.

Is there any way to force this behavior even if I initially used a CD to install the compute nodes? My nodes are capable of PXE boot, but

Page 296: 2003 December

since I didn't use that, I presume they didn't do the non-bootable disk trick upon install. Now that I'm clear about how the PXE install works, I'd prefer to move to that, but don't really want to have to corrupt the disks to cause the PXE install.

The nodes are currently loaded with 3.0, so perhaps that will work with 3.1's kickstart, but I'm curious about the PXE issue.

Thanks,

Steve Jenks

From bruno at rocksclusters.org Tue Dec 30 12:48:08 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 30 Dec 2003 12:48:08 -0800Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Dec 30, 2003, at 12:37 PM, Stephen Jenks wrote:

>> On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote:>> when compute nodes are installed via PXE, rocks detects this and >> manipulates the boot sector of the disk drive on the compute node >> that makes the disk non-bootable. that way, if the compute node is >> reset, it will try to PXE boot. it will PXE boot even if your boot >> order is: hard disk, cd/floppy, PXE. this occurs because the hard >> disk is non-bootable so the BIOS boot loader will skip the hard disk >> and move on to the other boot devices.>> Hi Greg, et al.>> Is there any way to force this behavior even if I initially used a CD > to install the compute nodes? My nodes are capable of PXE boot, but > since I didn't use that, I presume they didn't do the non-bootable > disk trick upon install. Now that I'm clear about how the PXE install > works, I'd prefer to move to that, but don't really want to have to > corrupt the disks to cause the PXE install.>> The nodes are currently loaded with 3.0, so perhaps that will work > with 3.1's kickstart, but I'm curious about the PXE issue.

3.0 is based on redhat 7.3 and 3.1 is based on redhat enterprise linux 3.0 -- so you'll hit a similar problem as doug did when you perform an upgrade.

give me a bit of time to cook up a procedure for forcing your compute nodes to PXE boot.

- gb

Page 297: 2003 December

From cdwan at mail.ahc.umn.edu Tue Dec 30 14:22:18 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Tue, 30 Dec 2003 16:22:18 -0600 (CST)Subject: [Rocks-Discuss]NIS outside, 411 inside?Message-ID: <[email protected]>

Is there a preferred way to have the 411 server on the head node replicateinformation (passwd and auto.whatever) from an external NIS server to thecompute nodes? It seems to me that a cron job like the one below does thetrick, but it feels crufty to me:

ypcat passwd > yp.passwd; cat /etc/passwd yp.passwd > 411.passwd ** build the 411 distributed passwd from the file above instead of ** /etc/passwd.

I'd love to hear suggestions for a more elegant solution.

-Chris Dwan The University of Minnesota

From bruno at rocksclusters.org Tue Dec 30 15:16:36 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 30 Dec 2003 15:16:36 -0800Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Dec 30, 2003, at 12:37 PM, Stephen Jenks wrote:

>> On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote:>> when compute nodes are installed via PXE, rocks detects this and >> manipulates the boot sector of the disk drive on the compute node >> that makes the disk non-bootable. that way, if the compute node is >> reset, it will try to PXE boot. it will PXE boot even if your boot >> order is: hard disk, cd/floppy, PXE. this occurs because the hard >> disk is non-bootable so the BIOS boot loader will skip the hard disk >> and move on to the other boot devices.>> Hi Greg, et al.>> Is there any way to force this behavior even if I initially used a CD > to install the compute nodes? My nodes are capable of PXE boot, but > since I didn't use that, I presume they didn't do the non-bootable > disk trick upon install. Now that I'm clear about how the PXE install > works, I'd prefer to move to that, but don't really want to have to > corrupt the disks to cause the PXE install.>> The nodes are currently loaded with 3.0, so perhaps that will work > with 3.1's kickstart, but I'm curious about the PXE issue.

here's a procedure to ensure that your non-3.1.0 compute nodes PXE

Page 298: 2003 December

install after a frontend upgrade.

this assumes your compute nodes support PXE installs.

before you upgrade the frontend, login to the frontend and execute:

# ssh-agent $SHELL# ssh-add

# cluster-fork 'touch /boot/grub/pxe-install'

# cluster-fork '/boot/kickstart/cluster-kickstart --start'

# cluster-fork '/sbin/chkconfig --del rocks-grub'

now you can shutdown your compute nodes.

then upgrade your frontend.

after you login to your new frontend, run insert-ethers, then reset each compute node, one at a time.

doug, you'll have a bit harder time.

if you can find a bootable floppy, after the compute node boots, you can chroot to the root partition on the disk and run the three cluster-fork commands above.

i apologize for making this procedure tough on you.

- gb

From mjk at sdsc.edu Tue Dec 30 15:32:20 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 30 Dec 2003 15:32:20 -0800Subject: [Rocks-Discuss]LicensingIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Nothing!

Rocks is entirely open source with various GNU, BSD, Artistic, etc open source licenses attached. The underlying RedHat OS (as of Rocks 3.1.0 -- available now) is recompiled from RedHat's publicly available SRPMS. You of course welcome to send us money and hardware to help further the causes. Several vendor do in fact do this, and this helps us support them.

-mjk

On Dec 30, 2003, at 6:03 AM, Purushotham Komaravolu wrote:

> Hi All,

Page 299: 2003 December

> I would like to know the list of the components that have > to be> licensed, when we install ROCKS as a commercial solution.> Thanks> Happy Holidays> Puru

From mjk at sdsc.edu Tue Dec 30 15:35:39 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 30 Dec 2003 15:35:39 -0800Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in some ways this job is simpler now, although no one has done this yet. A simple ypcat like you have will do most of the right thing and 411 will pick up the changed and send them around the cluster. But, you need to figure out how to merge the cluster information with the external NIS information. This will include things like the IP address for the cluster compute nodes.

-mjk

On Dec 30, 2003, at 2:22 PM, Chris Dwan (CCGB) wrote:

>> Is there a preferred way to have the 411 server on the head node > replicate> information (passwd and auto.whatever) from an external NIS server to > the> compute nodes? It seems to me that a cron job like the one below does > the> trick, but it feels crufty to me:>> ypcat passwd > yp.passwd;> cat /etc/passwd yp.passwd > 411.passwd> ** build the 411 distributed passwd from the file above instead of> ** /etc/passwd.>> I'd love to hear suggestions for a more elegant solution.>> -Chris Dwan> The University of Minnesota>

From mitchskin at comcast.net Tue Dec 30 17:13:44 2003From: mitchskin at comcast.net (Mitchell Skinner)Date: Tue, 30 Dec 2003 17:13:44 -0800Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <1072833146.8645.1114.camel@zeitgeist>

Page 300: 2003 December

On Tue, 2003-12-30 at 12:28, Doug Neuhauser wrote:> I'll try Joe Landman's suggestion of a scrub floppy to scrub the boot> sector of the boot disk on the compute nodes. If I can't do that, I> CAN go through the manual process of setting and resetting the boot> order on each compute node, but it is a slow and sequential process.

Something I'm going to try and implement at our site is support for thepxelinux 'localboot' option. If the hard drives have a valid bootsector, I can leave the BIOS set to PXE boot before the hard drive, andby changing the pxelinux configuration on the head node, I can set aparticular node to boot from the network or from the local disk. Inother words, when a node PXE boots, it might get either the kickstartinstructions or the 'boot from hard drive' instructions.

That will take some fiddling, I think, because the head node then has tomaintain some more state for all of the compute nodes. I really want toavoid going through the BIOS setup on all my nodes more than once,though.

Is this something that the ROCKS mainline would be interested in?

Mitch

From doug at seismo.berkeley.edu Tue Dec 30 17:51:49 2003From: doug at seismo.berkeley.edu (Doug Neuhauser)Date: Tue, 30 Dec 2003 17:51:49 -0800 (PST)Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsMessage-ID: <[email protected]>

My solution to force PXE boot is outlined below.

1. Boot dban floppy (floppy image at http://dban.sourceforge.net/ ).

2. Run "quick" purge of disks on system (I only have 1 disk on compute nodes). I let the disk purge get far enough into the disk to overwrite the boot sectors and filesystem -- I didn't wait for it to completely erase the entire disk.

3. Reset the system, and CYCLE POWER on the compute node.

NOTE: If you don't cycle power, the BIOS sees the disk, but reports that it has a fatal error reading from it. This caused the following problems: a. PXE boot worked, but Rocks install also did not see the disk.

It asked whether you want to manually configure the disk, butthe configuration failed immeditately irregardless of whether I answered yes or no. The Rocks developers may want to look intothis bug.

b. By the time that I figured out that I needed to cycle power, the BIOS had already removed the disk from the boot order. My boot order was now: floppy, PXE, disk Rocks installed properly once, twice, .... until I reset the boot order to: floppy, disk, PXE. 4. Compute node will now perform PXE boot, install Rocks 3.1.0, and subsequent "controlled reboots" will boot from disk. If the node

Page 301: 2003 December

is powered down or reset with reset button, no boot block is left on disk, and the system will perform PXE boot and reinstall Rocks.

------------------------------------------------------------------------Doug Neuhauser University of California, Berkeleydoug at seismo.berkeley.edu Berkeley Seismological LaboratoryPhone: 510-642-0931 215 McCone Hall # 4760Fax: 510-643-5811 Berkeley, CA 94720-4760

From tim.carlson at pnl.gov Tue Dec 30 19:17:11 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Tue, 30 Dec 2003 19:17:11 -0800 (PST)Subject: [Rocks-Discuss]Rocks 3.1.0 install problemsIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Tue, 30 Dec 2003, Doug Neuhauser wrote:

> 2. Run "quick" purge of disks on system (I only have 1 disk on compute nodes).> I let the disk purge get far enough into the disk to overwrite the boot> sectors and filesystem -- I didn't wait for it to completely erase the> entire disk.

Here is something that is a bit quicker

cluster-fork dd if=/dev/zero of=/dev/hda bs=1k count=512

Then either power cycle or

cluster-fork reboot

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From cdwan at mail.ahc.umn.edu Tue Dec 30 19:44:11 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Tue, 30 Dec 2003 21:44:11 -0600 (CST)Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

> As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in> some ways this job is simpler now, although no one has done this yet.> A simple ypcat like you have will do most of the right thing and 411> will pick up the changed and send them around the cluster. But, you> need to figure out how to merge the cluster information with the> external NIS information. This will include things like the IP address> for the cluster compute nodes.

Page 302: 2003 December

The shuffling below would work, I think, but it still gives me thewillies to be mucking with the passwd file every hour:

mv /etc/passwd /etc/passwd.local ypcat /etc/passwd > /etc/passwd.nis cat /etc/passwd.local /etc/passwd.nis > /etc/passwd service 411 commit cp /etc/passwd.local /etc/passwd

Am I missing the simple way? I seem to have an affinity for finding themaximially complex way to do things...

-Chris Dwan The University of Minnesota

From mjk at sdsc.edu Tue Dec 30 19:58:43 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Tue, 30 Dec 2003 19:58:43 -0800Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

This sounds reasonable, but you still have a chance of conflicting UIDs in your password file. If you only issues accounts from your LAN NIS server than you should be fine. I'd suggest adding the accounts created by Rocks into your server (just look at the initial passwd file). The SGE roll creates an SGE user, others may also exist.

You can also try setting up your frontend as an NIS client of your external server, with the same UID issues above.

The bad news is we don't have a canned answer, and need someone to give us one. The good news is with 411 in place only the frontend need be changed and the compute node will still function as stock Rocks.

-mjk

On Dec 30, 2003, at 7:44 PM, Chris Dwan (CCGB) wrote:

>>> As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in>> some ways this job is simpler now, although no one has done this yet.>> A simple ypcat like you have will do most of the right thing and 411>> will pick up the changed and send them around the cluster. But, you>> need to figure out how to merge the cluster information with the>> external NIS information. This will include things like the IP >> address>> for the cluster compute nodes.>> The shuffling below would work, I think, but it still gives me the> willies to be mucking with the passwd file every hour:>> mv /etc/passwd /etc/passwd.local> ypcat /etc/passwd > /etc/passwd.nis

Page 303: 2003 December

> cat /etc/passwd.local /etc/passwd.nis > /etc/passwd> service 411 commit> cp /etc/passwd.local /etc/passwd>> Am I missing the simple way? I seem to have an affinity for finding > the> maximially complex way to do things...>> -Chris Dwan> The University of Minnesota

From csamuel at vpac.org Tue Dec 30 19:59:51 2003From: csamuel at vpac.org (Chris Samuel)Date: Wed, 31 Dec 2003 14:59:51 +1100Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

-----BEGIN PGP SIGNED MESSAGE-----Hash: SHA1

On Wed, 31 Dec 2003 02:44 pm, Chris Dwan (CCGB) wrote:

> mv /etc/passwd /etc/passwd.local> ypcat /etc/passwd > /etc/passwd.nis> cat /etc/passwd.local /etc/passwd.nis > /etc/passwd> service 411 commit> cp /etc/passwd.local /etc/passwd

Hmm, how about:

ypcat passwd > /etc/passwd.niscat /etc/passwd /etc/passwd.nis > /etc/passwd.tmpcp /etc/passwd /etc/passwd.localmv /etc/passwd.tmp /etc/passwdservice 411 commitmv /etc/passwd.local /etc/passwd

That should mean that you're never operating without a password file and the overwrites should be approaching atomic (I hope).

Of course, it'd be nice if you could do whatever the 411 init file does on something else than /etc/passwd :-)

Disclaimer: I have not tried this myself & don't (yet) have a 3.1 system to test with, caveat emptor, batteries not includeded, IANAL, etc..

cheers!Chris- -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

Page 304: 2003 December

-----BEGIN PGP SIGNATURE-----Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/8km3O2KABBYQAh8RAnpPAJ9a9oRdGXeBUBAokdX6wmwrVbgXkQCeKD0Cxh8eT6qTbZpxhu8+FHPSt90==lhiY-----END PGP SIGNATURE-----

From csamuel at vpac.org Tue Dec 30 20:01:39 2003From: csamuel at vpac.org (Chris Samuel)Date: Wed, 31 Dec 2003 15:01:39 +1100Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

-----BEGIN PGP SIGNED MESSAGE-----Hash: SHA1

On Wed, 31 Dec 2003 02:59 pm, Chris Samuel wrote:

> cp /etc/passwd /etc/passwd.local

should be:

cp -p /etc/passwd /etc/passwd.local

Oh, and what happens if users overlap ? :-)

cheers,Chris- -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/8kojO2KABBYQAh8RAmWTAJwNhpm77IclXcWLoAuhp2/B4/GsCgCfZWekme3Lk2I7VDmRj4ygTSLSaaY==Pv8G-----END PGP SIGNATURE-----

From cdwan at mail.ahc.umn.edu Tue Dec 30 20:12:34 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Tue, 30 Dec 2003 22:12:34 -0600 (CST)Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>

Page 305: 2003 December

Message-ID: <[email protected]>

> Of course, it'd be nice if you could do whatever the 411 init file does on> something else than /etc/passwd :-)

That would be a really big step. I'm deeply wary of cron jobs thatoverwrite my passwd file.

The next step might be to put this functionality into 411 itself. itwould be truly cool to have an automatic, non NIS way to make the passwd,group, autofs, and host lookup stuff be consistent and static across thecluster nodes.

On the other hand, I appreciate that this is probably a complex enoughsystem without trying to reinvent NIS but leave out the brittle serverbits. We can work around for the time being.

-Chris Dwan

From doug at seismo.berkeley.edu Tue Dec 30 20:34:25 2003From: doug at seismo.berkeley.edu (Doug Neuhauser)Date: Tue, 30 Dec 2003 20:34:25 -0800 (PST)Subject: [Rocks-Discuss]Mozilla / ssh DISPLAY problem with Rocks 3.1.0Message-ID: <[email protected]>

I am having a problem using mozilla with the default Rocks monitor web pageover an ssh session to my headnode from a Sun workstation with a 24-bit display. My workstation is Sun Blade 150 running Solaris 8, and I amusing SSH Secure Shell 3.2.5 (non-commercial version).

When I ssh to my frontend and to run mozilla, I get an empty Mozilla frame.Running mozilla with debugging options "--g-fatal-warnings" I get:

Gdk-WARNING **: Attempt to draw a drawable with depth 24 to a drawable with depth 8aborting...

xwinfino shows the following window characteristics:

xwininfo: Window id: 0x9400034 "GCLCluster Cluster - Mozilla"

Absolute upper-left X: 175 Absolute upper-left Y: 150 Relative upper-left X: 0 Relative upper-left Y: 0 Width: 1021 Height: 738 Depth: 8 Visual Class: PseudoColor Border width: 0 Class: InputOutput Colormap: 0x22 (installed) Bit Gravity State: NorthWestGravity Window Gravity State: NorthWestGravity Backing Store State: NotUseful Save Under State: no Map State: IsViewable Override Redirect State: no

Page 306: 2003 December

Corners: +175+150 -84+150 -84-136 +175-136 -geometry 1021x738-78+125

Is there a way to configure mozilla to use only a 8-bit drawable?

If I ssh from a workstation with an 8-bit display, mozilla starts upOK, and creates an 8-bit window.

- Doug N------------------------------------------------------------------------Doug Neuhauser University of California, Berkeleydoug at seismo.berkeley.edu Berkeley Seismological LaboratoryPhone: 510-642-0931 215 McCone Hall # 4760Fax: 510-643-5811 Berkeley, CA 94720-4760

From qian1129 at yahoo.com Tue Dec 30 22:47:57 2003From: qian1129 at yahoo.com (li lee)Date: Tue, 30 Dec 2003 22:47:57 -0800 (PST)Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0Message-ID: <[email protected]>

Hi,

I want to install Rocks v3.1.0 in PCs, but I do notwant to so many CDs:

roll-grid-3.1.0-0.any.isoroll-intel-3.1.0-0.any.isoroll-sge-3.1.0-0.any.iso

......So, how to install all these after Rocks and HPCinstallation on clusters?

Thanks

Li

__________________________________Do you Yahoo!?Find out what made the Top Yahoo! Searches of 2003http://search.yahoo.com/top2003

From bruno at rocksclusters.org Tue Dec 30 23:35:28 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Tue, 30 Dec 2003 23:35:28 -0800Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

> I want to install Rocks v3.1.0 in PCs, but I do not> want to so many CDs:> roll-grid-3.1.0-0.any.iso> roll-intel-3.1.0-0.any.iso> roll-sge-3.1.0-0.any.iso> ......> So, how to install all these after Rocks and HPC

Page 307: 2003 December

> installation on clusters?

for now, we do not have a systematic way in which to incorporate rolls after the frontend is up. this is on our 'todo' list.

- gb

From tim.carlson at pnl.gov Wed Dec 31 07:29:21 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Wed, 31 Dec 2003 07:29:21 -0800 (PST)Subject: [Rocks-Discuss]Mozilla / ssh DISPLAY problem with Rocks 3.1.0In-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Tue, 30 Dec 2003, Doug Neuhauser wrote:

>> I am having a problem using mozilla with the default Rocks monitor web page> over an ssh session to my headnode from a Sun workstation with a 24-bit> display. My workstation is Sun Blade 150 running Solaris 8, and I am> using SSH Secure Shell 3.2.5 (non-commercial version).>> When I ssh to my frontend and to run mozilla, I get an empty Mozilla frame.> Running mozilla with debugging options "--g-fatal-warnings" I get:

This sounds like an X tunnel problem. I see X tunnel errors all the time(OpenGL, colormap, etc). What happens if you just set the DISPLAYvariable back to your Sun box and do the proper xhost command on the Sun?

Tim

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From mjk at sdsc.edu Wed Dec 31 09:45:49 2003From: mjk at sdsc.edu (Mason J. Katz)Date: Wed, 31 Dec 2003 09:45:49 -0800Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0In-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

For this release you need all these CDs (if you want this functionality). Think of Rolls as add-on packs to Rocks, and remember that software belongs on a CD (not a tar ball, or ftp site). CDs are the accepted commercial way of releasing software, they a very nice. But we have some issues with this that we are addressing right now:

- Meta-Rolls. That is how do you merge multiple Rolls into a single CD image. This is actually very easy to do, and we have some early code for this, it will be there in the next release. For IA64 we merge the

Page 308: 2003 December

HPC Roll onto the base DVD, so we have a proof of concept here.

- Rolls cannot be added after a cluster is installed, and must be used during installation.

- Rolls cannot be uninstalled.

Rolls are maturing pretty quickly, and we know where they need to go.

-mjk

On Dec 30, 2003, at 10:47 PM, li lee wrote:

> Hi,>> I want to install Rocks v3.1.0 in PCs, but I do not> want to so many CDs:> roll-grid-3.1.0-0.any.iso> roll-intel-3.1.0-0.any.iso> roll-sge-3.1.0-0.any.iso> ......> So, how to install all these after Rocks and HPC> installation on clusters?>> Thanks>> Li>> __________________________________> Do you Yahoo!?> Find out what made the Top Yahoo! Searches of 2003> http://search.yahoo.com/top2003

From michal at harddata.com Wed Dec 31 10:05:26 2003From: michal at harddata.com (Michal Jaegermann)Date: Wed, 31 Dec 2003 11:05:26 -0700Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>; from [email protected] on Tue, Dec 30, 2003 at 09:44:11PM -0600References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Tue, Dec 30, 2003 at 09:44:11PM -0600, Chris Dwan (CCGB) wrote:> > > The shuffling below would work, I think, but it still gives me the> willies to be mucking with the passwd file every hour:> > mv /etc/passwd /etc/passwd.local> ypcat /etc/passwd > /etc/passwd.nis> cat /etc/passwd.local /etc/passwd.nis > /etc/passwd> service 411 commit> cp /etc/passwd.local /etc/passwd> > Am I missing the simple way?

Page 309: 2003 December

cp -p /etc/passwd /etc/passwd.local ypcat passwd >> /etc/passwd service 411 commit mv /etc/passwd.local /etc/passwd

unless 'service 411' cat be told to use another file. You minimizethat way a time gap when you are without /etc/passwd, you makesure that file attributes on /etc/passwd will remain intact andyou are not left with extra files.

You can also play with (symbolic) links but I am not sure if everypossible /etc/passwd reader will indeed follow a link.

Michal

From michal at harddata.com Wed Dec 31 10:16:18 2003From: michal at harddata.com (Michal Jaegermann)Date: Wed, 31 Dec 2003 11:16:18 -0700Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>; from [email protected] on Wed, Dec 31, 2003 at 03:01:39PM +1100References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

On Wed, Dec 31, 2003 at 03:01:39PM +1100, Chris Samuel wrote:> should be:> > cp -p /etc/passwd /etc/passwd.local> > Oh, and what happens if users overlap ? :-)

'sort -u' over relevant fields after replacing ':'s with blanks? Butthis is getting somewhat tad more involved and an "automaticconflict resolution" still may screw up. A bit of coordinationbetween whomever maintains NIS and local user data, like reservingsome names and uid ranges for one or another, is likely moreeffective in practice.

Michal

From bruno at rocksclusters.org Wed Dec 31 10:42:21 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 31 Dec 2003 10:42:21 -0800Subject: [Rocks-Discuss]Roll Documentation posted on the web siteMessage-ID: <[email protected]>

just posted documentation for some of the rolls on the web site -- see the left-hand side of the web page:

http://www.rocksclusters.org/Rocks/

and here are the links to the roll documentation:

Page 310: 2003 December

HPC Roll: http://www.rocksclusters.org/rocks-documentation/3.1.0/

SGE Roll: http://www.rocksclusters.org/roll-documentation/sge/3.1.0/

Grid Roll: http://www.rocksclusters.org/roll-documentation/grid/3.1.0/

Intel Roll: http://www.rocksclusters.org/roll-documentation/intel/3.1.0/

as a side note, for every one of the rolls you install above, the documentation will be available on your frontend at:

http://localhost/roll-documentation/

- gb

From cdwan at mail.ahc.umn.edu Wed Dec 31 11:07:37 2003From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))Date: Wed, 31 Dec 2003 13:07:37 -0600 (CST)Subject: [Rocks-Discuss]NIS outside, 411 inside?In-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

> this is getting somewhat tad more involved and an "automatic> conflict resolution" still may screw up.

I agree with this assessment. The key is to keep the local passwd file assmall as possible, and remove redundant accounts on the frontend node.Since it consists mostly of non-login accounts anyway, this shouldn't betoo difficult...and it's a one time task anyway.

I've settled on the hourly cron job below. I'll report any weirdness asappropriate. Thanks for all the suggestions and discussion.

#!/bin/shypcat auto.master > /etc/auto.masterypcat auto.home > /etc/auto.homeypcat auto.net > /etc/auto.netypcat auto.web > /etc/auto.web

ypcat passwd > /etc/passwd.niscat /etc/passwd.local /etc/passwd.nis > /etc/passwd.combinedcp /etc/passwd.combined /etc/passwd

ypcat group > /etc/group.niscat /etc/group.local /etc/group.nis > /etc/group.combinedcp /etc/group.combined /etc/group

-Chris Dwan The University of Minnesota

Page 311: 2003 December

From maz at tempestcomputers.com Wed Dec 31 11:37:09 2003From: maz at tempestcomputers.com (John Mazza)Date: Wed, 31 Dec 2003 14:37:09 -0500Subject: [Rocks-Discuss]Rocks 3.1.0 with Adaptec I2O RAIDMessage-ID: <[email protected]>

Does anyone know of a way to make 3.1.0 (x86-64) versionwork with an Adaptec 2100S SCSI RAID card? My master nodeneeds to use this card, but it doesn't appear to be in thekernel on the CD. Also, does it support the SysKonnect SK-9821 (Ver 2.0) Gig cards?

Thanks!

From tim.carlson at pnl.gov Wed Dec 31 12:49:25 2003From: tim.carlson at pnl.gov (Tim Carlson)Date: Wed, 31 Dec 2003 12:49:25 -0800 (PST)Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>Message-ID: <[email protected]>

On Mon, 29 Dec 2003, landman wrote:

> SSH is too slow. Wow. 5-10 seconds to log in.

Just getting around to this. I did a clean install on our test cluster(Dell 1550 and 1750 boxes). No delays with ssh. As root or a normaluser, a "cluster-fork date" command on 4 nodes took under .6 seconds

Sounds like you have some type of DNS issue. Did you get a bad/etc/resolv.conf file on the nodes for some reason?

> a) md (e.g. Software RAID): Just try to build one. Anaconda will> happily let you do this ... though it will die in the formatting stages.> Dropping into the shell (Alt-F2) and looking for the md module (lsmod)> shows nothing. Insmod the md also doesn't do anything. Catting> /proc/devices shows no md as a character or block device.

The odd bit here is that you can do a

modprobe raid0

on a running frontend and it gets installed but there is no associated"md" module. Was "md" built directly into the kernel? very odd.

>b) ext3. There is no ext3 available for the install.

This is a bit annoying. Nobody really uses ext2 anymore do they? :) Nothaving ext3 as an install option isn't a show stopper for me since I cando a tune2fs after the fact. But ext3 should be there.

Having version 2.0.8 of the myrinet drivers up and running is a big + in

Page 312: 2003 December

my book. SGE 5.3p5 is also nice to see.

It will be some time before I upgrade any production clusters given thedifferences between Rh 7.3 and WS 3.0. Too big of a jump for me right now.We first need to convert a couple hundred desktop boxes :)

Tim CarlsonVoice: (509) 376 3423Email: Tim.Carlson at pnl.govEMSL UNIX System Support

From James_ODell at Brown.edu Wed Dec 31 13:09:25 2003From: James_ODell at Brown.edu (James O'Dell)Date: Wed, 31 Dec 2003 16:09:25 -0500Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

For whatever its worth, MPICH works MUCH better when run over rsh that ssh. It seems as if ssh doesn't pass alongsignals nearly as well as rsh. Since enabling rsh and configuring MPICH to use it, we have had no Zombie jobson our compute nodes. When using SSH they were a common occurrence. In fact, if you look at the MPICH implementation for myrinet, you'll see the contortions that they use to try and clean up compute nodes when using ssh.

Jim

On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote:

> On Mon, 29 Dec 2003, landman wrote:>>> SSH is too slow. Wow. 5-10 seconds to log in.>> Just getting around to this. I did a clean install on our test cluster> (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal> user, a "cluster-fork date" command on 4 nodes took under .6 seconds>> Sounds like you have some type of DNS issue. Did you get a bad> /etc/resolv.conf file on the nodes for some reason?>>> a) md (e.g. Software RAID): Just try to build one. Anaconda will>> happily let you do this ... though it will die in the formatting >> stages.>> Dropping into the shell (Alt-F2) and looking for the md module (lsmod)>> shows nothing. Insmod the md also doesn't do anything. Catting>> /proc/devices shows no md as a character or block device.>> The odd bit here is that you can do a>> modprobe raid0>> on a running frontend and it gets installed but there is no associated> "md" module. Was "md" built directly into the kernel? very odd.

Page 313: 2003 December

>>> b) ext3. There is no ext3 available for the install.>> This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not> having ext3 as an install option isn't a show stopper for me since I > can> do a tune2fs after the fact. But ext3 should be there.>> Having version 2.0.8 of the myrinet drivers up and running is a big + > in> my book. SGE 5.3p5 is also nice to see.>> It will be some time before I upgrade any production clusters given the> differences between Rh 7.3 and WS 3.0. Too big of a jump for me right > now.> We first need to convert a couple hundred desktop boxes :)>> Tim Carlson> Voice: (509) 376 3423> Email: Tim.Carlson at pnl.gov> EMSL UNIX System Support>

From landman at scalableinformatics.com Wed Dec 31 14:46:22 2003From: landman at scalableinformatics.com (Joe Landman)Date: Wed, 31 Dec 2003 17:46:22 -0500Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]>Message-ID: <[email protected]>

On Wed, 2003-12-31 at 15:49, Tim Carlson wrote:> On Mon, 29 Dec 2003, landman wrote:> > > SSH is too slow. Wow. 5-10 seconds to log in.> > Just getting around to this. I did a clean install on our test cluster> (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal> user, a "cluster-fork date" command on 4 nodes took under .6 seconds

Yeah, some weirdness in DNS. Re-load on one cluster head took care ofit, on the other applying dnsmasq helped.

> > Sounds like you have some type of DNS issue. Did you get a bad> /etc/resolv.conf file on the nodes for some reason?> > > a) md (e.g. Software RAID): Just try to build one. Anaconda will> > happily let you do this ... though it will die in the formatting stages.> > Dropping into the shell (Alt-F2) and looking for the md module (lsmod)> > shows nothing. Insmod the md also doesn't do anything. Catting> > /proc/devices shows no md as a character or block device.> > The odd bit here is that you can do a> > modprobe raid0>

Page 314: 2003 December

> on a running frontend and it gets installed but there is no associated> "md" module. Was "md" built directly into the kernel? very odd.

True, but I wanted to do a raid 1. I tried the insmod raid1 but itdidn't work, from what I can see the module was not in the build. Thisis ok, as some of it can be done later.

> > >b) ext3. There is no ext3 available for the install.> > This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not> having ext3 as an install option isn't a show stopper for me since I can> do a tune2fs after the fact. But ext3 should be there.

Thats what I did. I'll post a quick set of instructions for this alittle later.

> > Having version 2.0.8 of the myrinet drivers up and running is a big + in> my book. SGE 5.3p5 is also nice to see.

I agree, though I would like to see people do a

cluster-fork "/etc/init.d/rcsge stop"cluster-fork "chown -R root:root /opt/gridengine/bin

/opt/gridengine/utilbin"cluster-fork "/etc/init.d/rcsge start"

to fix the compute node sge permissions. Some of the utils don't workotherwise.

> > It will be some time before I upgrade any production clusters given the> differences between Rh 7.3 and WS 3.0. Too big of a jump for me right now.> We first need to convert a couple hundred desktop boxes :)

:)

> > Tim Carlson> Voice: (509) 376 3423> Email: Tim.Carlson at pnl.gov> EMSL UNIX System Support>

From landman at scalableinformatics.com Wed Dec 31 14:48:08 2003From: landman at scalableinformatics.com (Joe Landman)Date: Wed, 31 Dec 2003 17:48:08 -0500Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]>

<[email protected]>Message-ID: <[email protected]>

Hi James:

Did you rebuild MPICH for this? I noticed the signal handling bit

Page 315: 2003 December

using mpiBLAST. Lots of zombies to deal with.

Joe

On Wed, 2003-12-31 at 16:09, James O'Dell wrote:> For whatever its worth, MPICH works MUCH better when run over rsh that > ssh. It seems as if ssh doesn't pass along> signals nearly as well as rsh. Since enabling rsh and configuring MPICH > to use it, we have had no Zombie jobs> on our compute nodes. When using SSH they were a common occurrence. In > fact, if you look at the MPICH implementation for myrinet, you'll see > the contortions that they use to try and clean up compute nodes when > using ssh.> > Jim> > On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote:> > > On Mon, 29 Dec 2003, landman wrote:> >> >> SSH is too slow. Wow. 5-10 seconds to log in.> >> > Just getting around to this. I did a clean install on our test cluster> > (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal> > user, a "cluster-fork date" command on 4 nodes took under .6 seconds> >> > Sounds like you have some type of DNS issue. Did you get a bad> > /etc/resolv.conf file on the nodes for some reason?> >> >> a) md (e.g. Software RAID): Just try to build one. Anaconda will> >> happily let you do this ... though it will die in the formatting > >> stages.> >> Dropping into the shell (Alt-F2) and looking for the md module (lsmod)> >> shows nothing. Insmod the md also doesn't do anything. Catting> >> /proc/devices shows no md as a character or block device.> >> > The odd bit here is that you can do a> >> > modprobe raid0> >> > on a running frontend and it gets installed but there is no associated> > "md" module. Was "md" built directly into the kernel? very odd.> >> >> b) ext3. There is no ext3 available for the install.> >> > This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not> > having ext3 as an install option isn't a show stopper for me since I > > can> > do a tune2fs after the fact. But ext3 should be there.> >> > Having version 2.0.8 of the myrinet drivers up and running is a big + > > in> > my book. SGE 5.3p5 is also nice to see.> >> > It will be some time before I upgrade any production clusters given the> > differences between Rh 7.3 and WS 3.0. Too big of a jump for me right > > now.> > We first need to convert a couple hundred desktop boxes :)> >

Page 316: 2003 December

> > Tim Carlson> > Voice: (509) 376 3423> > Email: Tim.Carlson at pnl.gov> > EMSL UNIX System Support> >

From James_ODell at Brown.edu Wed Dec 31 15:12:59 2003From: James_ODell at Brown.edu (James O'Dell)Date: Wed, 31 Dec 2003 18:12:59 -0500Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]> <[email protected]>Message-ID: <[email protected]>

The cheap way to do it is to grep the bin directory and look for SSH in the executionscripts. You can change them to RSH and MPICH will use RSH to execute.

An alternative is to set RSHCOMMAND=rsh during a rebuild. I'm pretty sure thatthis method accomplishes precisely the same thing as simply editing the executionscripts.

Jim

On Dec 31, 2003, at 5:48 PM, Joe Landman wrote:

> Hi James:>> Did you rebuild MPICH for this? I noticed the signal handling bit> using mpiBLAST. Lots of zombies to deal with.>> Joe>> On Wed, 2003-12-31 at 16:09, James O'Dell wrote:>> For whatever its worth, MPICH works MUCH better when run over rsh that>> ssh. It seems as if ssh doesn't pass along>> signals nearly as well as rsh. Since enabling rsh and configuring >> MPICH>> to use it, we have had no Zombie jobs>> on our compute nodes. When using SSH they were a common occurrence. >> In>> fact, if you look at the MPICH implementation for myrinet, you'll see>> the contortions that they use to try and clean up compute nodes when>> using ssh.>>>> Jim>>>> On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote:>>>>> On Mon, 29 Dec 2003, landman wrote:>>>>>>> SSH is too slow. Wow. 5-10 seconds to log in.>>>

Page 317: 2003 December

>>> Just getting around to this. I did a clean install on our test >>> cluster>>> (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal>>> user, a "cluster-fork date" command on 4 nodes took under .6 seconds>>>>>> Sounds like you have some type of DNS issue. Did you get a bad>>> /etc/resolv.conf file on the nodes for some reason?>>>>>>> a) md (e.g. Software RAID): Just try to build one. Anaconda will>>>> happily let you do this ... though it will die in the formatting>>>> stages.>>>> Dropping into the shell (Alt-F2) and looking for the md module >>>> (lsmod)>>>> shows nothing. Insmod the md also doesn't do anything. Catting>>>> /proc/devices shows no md as a character or block device.>>>>>> The odd bit here is that you can do a>>>>>> modprobe raid0>>>>>> on a running frontend and it gets installed but there is no >>> associated>>> "md" module. Was "md" built directly into the kernel? very odd.>>>>>>> b) ext3. There is no ext3 available for the install.>>>>>> This is a bit annoying. Nobody really uses ext2 anymore do they? :) >>> Not>>> having ext3 as an install option isn't a show stopper for me since I>>> can>>> do a tune2fs after the fact. But ext3 should be there.>>>>>> Having version 2.0.8 of the myrinet drivers up and running is a big +>>> in>>> my book. SGE 5.3p5 is also nice to see.>>>>>> It will be some time before I upgrade any production clusters given >>> the>>> differences between Rh 7.3 and WS 3.0. Too big of a jump for me right>>> now.>>> We first need to convert a couple hundred desktop boxes :)>>>>>> Tim Carlson>>> Voice: (509) 376 3423>>> Email: Tim.Carlson at pnl.gov>>> EMSL UNIX System Support>>>

From bruno at rocksclusters.org Wed Dec 31 15:46:23 2003From: bruno at rocksclusters.org (Greg Bruno)Date: Wed, 31 Dec 2003 15:46:23 -0800Subject: [Rocks-Discuss]3.1.0 surprisesIn-Reply-To: <[email protected]>References: <[email protected]> <[email protected]>Message-ID: <[email protected]>

Page 318: 2003 December

>> Having version 2.0.8 of the myrinet drivers up and running is a big + >> in>> my book. SGE 5.3p5 is also nice to see.>> I agree, though I would like to see people do a>> cluster-fork "/etc/init.d/rcsge stop"> cluster-fork "chown -R root:root /opt/gridengine/bin> /opt/gridengine/utilbin"> cluster-fork "/etc/init.d/rcsge start"> > to fix the compute node sge permissions. Some of the utils don't work> otherwise.

so we can test the fixes, what utilities need the above changes?

- gb

From landman at scalableinformatics.com Wed Dec 31 21:04:14 2003From: landman at scalableinformatics.com (Joe Landman)Date: Thu, 01 Jan 2004 00:04:14 -0500Subject: [Rocks-Discuss]looking for a work-aroundMessage-ID: <[email protected]>

Ok, this one is weird. On two different clusters using the samereplace-auto-partition.xml I get two completely different behaviors. Iam positive this is an anaconda issue, but it could be something else.

Both systems have IDE hard disks. I made the second one (my officesystem) match the other system, so the IDE hard disks are hda and hdb. Yes, I know this is not ideal, and I know that this should be changed. I am simply trying to match their system.

First the partitioning:

<main> <clearpart>--all</clearpart> <part> / --size 4096 --ondisk hda </part> <part> swap --size 1024 --ondisk hda </part> <part> raid.00 --size 1 --grow --ondisk hda </part> <part> /tmp --size 4096 --ondisk hdb </part> <part> swap --size 1024 --ondisk hdb </part> <part> raid.01 --size 1 --grow --ondisk hdb </part></main>

On one cluster (my office), this works perfectly.

On the other cluster, it fails with:

An unhandled exception has occurred. This is # ??? ??? most likely a bug. Please copy the full text ?????? ??? of this exception or save the crash dump to a ?????? ??? floppy then file a detailed bug report against ?????? ??? anaconda at http://bugzilla.redhat.com/bugzilla/ ???

Page 319: 2003 December

??? ??? ?????? ??? Traceback (most recent call last): ?????? ??? File "/usr/bin/anaconda.real", line 1081, in ? ?????? ??? intf.run(id, dispatch, configFileData) ?????? ??? File ?????? ??? "/var/tmp/anaconda-9.1//usr/lib/anaconda/text.py ?????? ??? ", line 448, in run ?????? ??? File "/tmp/ksclass.py", line 799, in __call__ ?????? ??? KeyError: swap # ??? ??? ??? ??? ?????????????????? ???????????????????????? ?????????????? ??? OK ??? ??? Save ??? ??? Debug?????? ??? ?????????????????? ???????????????????????? ?????????????? ??

(sorry about the question marks). It appears that this is a pythonKeyError, which occurs when the element being sought has not been found.

Any ideas?

Joe-- Joseph Landman, Ph.DScalable Informatics LLCemail: landman at scalableinformatics.com web: http://scalableinformatics.comphone: +1 734 612 4615