design & management of the jlab farms ian bird, jefferson lab may 24, 2001 fnal lccws

24
Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Upload: randell-wheeler

Post on 30-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Design & Management of the JLAB Farms

Ian Bird, Jefferson LabMay 24, 2001

FNAL LCCWS

Page 2: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Overview

• JLAB clusters– Aims– Description– Environment

• Batch software• Management

– Configuration – Maintenance – Monitoring

• Performance monitoring• Comments

Page 3: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Clusters at JLAB - 1

• Farm– Support experiments – reconstruction, analysis– 250 ( 320) Intel Linux CPU ( + 8 Sun Solaris)

• 6400 8000 SPECint95

– Goals:• Provide 2 passes of 1st level reconstruction at average

incoming data rate (10 MB/s)• (More recently) provide analysis, simulation, and general batch

facility

– Systems• First phase (1997) was 5 dual Ultra2 + 5 dual IBM 43p• 10 dual Linux (PII 300) acquired in 1998• Currently 165 dual PII/III (300, 400, 450, 500, 750, 1GHz)

– ASUS motherboards, 256 MB, ~40 GB SCSI, IDE, 100 Mbit– First 75 systems towers, 50 2u rackmount, 40 1u (½u?)

• Interactive front-ends– Sun E450’s, 4-proc Intel Xeon, (2 each), 2GB RAM, Gb Ethernet

Page 4: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Last summer, 16 duals (2u) + 500 GB cache (8u) per 19” rack

Recently, 5 TB IDE cache disk (5 x 8u) per 19”

Intel Linux Farm

First purchases, 9 duals per 24” rack

Page 5: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Clusters at JLAB - 2

• Lattice QCD cluster(s)– Existing clusters – in collaboration with MIT, at JLAB:

• Compaq Alpha– 16 XP1000 (500 MHz 21264), 256 or 512 MB, 100 Mbit – 12 Dual UP2000 (667 MHz 21264), 256 MB, 100 Mbit

• All have Myrinet interconnect• Front-end (login) machine has GB Ethernet, 400 GB

fileserver for data staging and transfers MIT JLAB

– Anticipated (funded)• 128 cpu (June 2001), Alpha or P4(?) in 1u• 128 cpu (Dec/Jan ?) – identical to 1st 128• Myrinet

Page 6: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

LQCD Clusters

16 single Alpha 21264, 1999

12 dual Alpha (Linux Networks), 2000

Page 7: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Environment

• JLAB has central computing environment (CUE)– NetApp fileservers – NFS & CIFS

• Home directories, group (software) areas, etc.

– Centrally provided software apps

• Available in– General computing environment– Farms and clusters– Managed desktops

• Compatibility between all environments – home and group areas available in farm, library compatibility, etc.

• Locally written software provides access to farm (and mass storage) from any JLAB system

• Campus network backbone is Gigabit Ethernet, with 100 Mbit to physicist desktops, OC-3 to ESnet

Page 8: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Jefferson LabMass Storage and Farm Systems

2001

Work File ServersWork File Servers10 TB – RAID 510 TB – RAID 5

Farm Cache File ServersFarm Cache File Servers4 x 400GB4 x 400GB

DST/Cache File DST/Cache File ServersServers

15 TB – RAID 015 TB – RAID 0

Batch and Interactive FarmBatch and Interactive Farm

DB ServerDB Server

Tape ServersTape Servers

From CLAS DAQFrom Hall A,C DAQ100 Mbit/s1000 Mbit/sFCALSCSI

Page 9: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Lattice QCD Metacenter

Myrinet switch128 ports

ClusterEthernet Switch

100 Mb ethernet

Gigabit

Dual

Alpha 21264

Qty. 64 (128)

Interactive Duals

Alpha 21264Qty. 2

400 GB

Development Cluster

CISCOCAT 5500

ESNET

OC-3

QuadSUN

E4000

270 GB 342 GB STK Redwood, STK 9840 Tape Drives

Staging Disks

300 TB

Mass Storage

System

Other Jlabcomputers

MetaStore

SH7400

File Server

MetaStore

SH7400

File Server

1 TB 1 TB

JLabCluster

ClusterEthernet Switch

100 Mb ethernet

Quad

Alpha 21264

Qty. 12

Interactive

Alpha 21164

Qty. 2

296 GB

Myrinet switch16 ports

AlantecEthernet Switch

Other MIT computers

AlphaServer8400 (12 cpu)

1 TB

13 TB

DLT

Library

SUN

E6000

10 cpu

120 GB

MIT Cluster

Development Cluster

File Server

Dual Pentium

RAIDFile Server

Page 10: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Batch Software

• Farm– Use LSF (v 4.0.1)

• Pricing now acceptable

– Manage resource allocation with• Job queues

– Production (reconstruction, etc)– Low-priority (for simulations), High-priority (short jobs)– Idle (pre-emptable)

• User + group allocations (shares)– Make full use of hierarchical shares - allows single

undivided cluster to be used efficiently by many groups– E.g.

Page 11: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Batch software - 2

• Users do not use LSF directly, use Java client (jsub), that:– Is available from any machine (does not need LSF)– Provides missing functionality, e.g.

• Submit 1000 jobs in 1 command• Fetches files from tape, pre-stages before job queued

for execution (don’t block farm with jobs waiting for data),

– Ensures efficient retrieval of files from tape - e.g. sort 1000 files by tape and by file no. on tape.

– Web interface (via servlet) to monitor job status and progress (as well as host, queue, etc.)

Page 12: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

View job status

Page 13: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

View host status

Page 14: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Batch software - 3

• LQCD clusters use PBS– JLAB written scheduler

• 7 stages – mimic LSF hierarchical behaviour

– Users access PBS commands directly– Web interface (portal) – authorization based on

certificates• Used to submit jobs between JLAB & MIT clusters

Page 15: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS
Page 16: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Batch software - 4

• Future– Combine jsub & LQCD portal features to wrap both

LSF and PBS– XML-based description language– Provide web-interface toolkit to experiments to

enable them to generate jobs based on expt. run data

– In context of PPDG

Page 17: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Cluster management

• Configuration– Initial configuration

• Kickstart, 2 post-install scripts for configuration, sw install (LSF etc), driven by a floppy

• Looking at PXE – DHCP (available on newer motherboards)– Avoids need for floppy – just power on– System working (last week)– Software: PXE standard bootprom (www.nilo.org/docs/pxe.html) – talks to

DHCP,» bpbatch – pre-boot shell (www.bpbatch.org) - downloads vmlinux, kickstart etc

• Alphas configured “by hand + kickstart”– Updates etc.

• Autorpm (especially for patches)• New kernels – by hand with scripts

• OS upgrades– Rolling upgrades – use queues to manage transition

• Missing piece:– Remote, network-accessible console screen access

• Have used serial console, KVM switches, monitor on a cart …• Linux Networks Alphas have remote power management – don’t use!

Page 18: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

System monitoring

• Farm systems – LM78 to monitor temp + fans via /proc

• This was our largest failure mode for Pentiums

– Mon (www.kernel.org/software/mon) • Used extensively for all our systems – page “on-call”• For batch farm checks mostly – fan, temp, ping

– Mprime (prime number search) has checks on memory and arithmetic integrity

• Used in initial system burn-in

Page 19: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Monitoring

Page 20: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Performance monitoring

• Use variety of mechanisms– Publish weekly tables and graphs based on LSF

statistics– Graphs from mrtg/rrd

• Network performance, #jobs, utilization, etc

Page 21: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS
Page 22: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS
Page 23: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Comments & Issues

• Space – very limited– Installing a new STK silo, moved all sys admins out

• Now have no admins in same building as machine room– Plans to build a new Computer Center …

• Have always been lights-out

Page 24: Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Future

• Accelerator and experiment upgrades– Expect first data in 2006, full rate 2007– 100 MB/s data acquisition– 1 – 3 PB/year (1 PB raw, > 1 PB simulated)– Compute clusters:

• Level 3 triggers• Reconstruction• Simulation• Analysis – PWA can be parallelized, but needs access to

very large reconstructed and simulated datasets

• Expansion of LQCD clusters– 10 Tflops by 2005