nersc site report

13
Office of Science U.S. Department of Energy NERSC Site Report NERSC Site Report HEPiX October 20, 2003 TRIUMF

Upload: redford

Post on 03-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

NERSC Site Report. HEPiX October 20, 2003 TRIUMF. LBL, NERSC, and PDSF. LBL manages the NERSC Center for DOE PDSF is the production Linux cluster at NERSC used primarily for HEP science Site report will touch on activities of interest to HEPiX community at each of these levels. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NERSC Site Report

Office of Science

U.S. Department of Energy

NERSC Site ReportNERSC Site Report

HEPiX

October 20, 2003

TRIUMF

Page 2: NERSC Site Report

Office of Science

U.S. Department of Energy

LBL, NERSC, and PDSFLBL, NERSC, and PDSF

• LBL manages the

• NERSC Center for DOE

• PDSF is the production Linux cluster at NERSC used primarily for HEP science

• Site report will touch on activities of interest to HEPiX community at each of these levels

Page 3: NERSC Site Report

Office of Science

U.S. Department of Energy

PDSF - New HardwarePDSF - New Hardware

• 96 Dual Athlon Systems

• 8 Storage Nodes - ~18 TB formatted

• All gigabit attached (Dell switches)

• Purchased two Opteron systems for testing

Page 4: NERSC Site Report

Office of Science

U.S. Department of Energy

PDSF ProjectsPDSF Projects

• HostDB - Presentation later• Sun GridEngine Evaluation

– Met all requirements (long list)– Putting in semi-production on retired nodes

• Grid certificate DN kernel module• 1-wire based monitoring and control network• High Availability Server

– Uses heartbeat code– IDE based Fibre-Channel array

Page 5: NERSC Site Report

Office of Science

U.S. Department of Energy

PDSF - Other newsPDSF - Other news

• Aztera– Zambeel folded– StorAd is making best effort to support the

system

• New User Groups– KamLAND– e896– ALICE

Page 6: NERSC Site Report

Office of Science

U.S. Department of Energy

IBM SPIBM SP

• Upgraded– 208 nodes added - 16 way

Nighthawk II– Additional 20 TB of disk

• Total System– 10 Tflops/s peak– 7.8 TB memory– 44 TB of GPFS storage

Page 7: NERSC Site Report

Office of Science

U.S. Department of Energy

Mass StorageMass Storage

• Hardware– New DataDirect disk cache– New tape drives allow high

capacity cartridges (200 GB)

• Software– Currently running HPSS 4.3– Testing 5.1

• Testing– DMAPI– htar command

Page 8: NERSC Site Report

Office of Science

U.S. Department of Energy

Grid ActivitiesGrid Activities

• GridFTP and gatekeeper deployed on all productions system (except gatekeeper on Seaborg which is coming soon)

• Integrating account management system with grid certificates

• Testing myproxy based system• Portal• Web interface to HPSS

Page 9: NERSC Site Report

Office of Science

U.S. Department of Energy

NetworkingNetworking

• Jumbo support to ESNET– Looking for other sites to test Jumbo across

WAN

• New production router (Juniper)

Page 10: NERSC Site Report

Office of Science

U.S. Department of Energy

GUPFSGUPFS• Hardware testbed:

– 3Par Data– Yotta Yotta– Dell EMC– Dot Hill– Data Direct (Soon)– Panasas

• Interconnect hardware:– Topspin (IB) – Infinicon (IB) – Cisco (ISCSI) – Qlogic (ISCSI) – Adaptec (ISCSI)– Myrinet 2000– Various FC

•Filesystems:–ADIC license–GPFS license–GFS 5.2 license–Lustre

•Test clients:•Dual processor 2.2GHz Xeons•2GB memory•2 PCI-X•Local HD for OS

Page 11: NERSC Site Report

Office of Science

U.S. Department of Energy

Distributed System Dept.Distributed System Dept.

• Net100 (http://www.net100.org/) - Built on Web100 (PSC, NCAR, NCSA) and NetLogger (LBNL), Net100 modifies operating systems to respond dynamically to network conditions and make adjustments in network transfers, sending data as fast as the network will allow.

• Self Configuring Network Monitor (SCNM) - (http://dsd.lbl.gov/Net-Mon/Self-Config.html) provide accurate, comprehensive, and on-demand, application-to-application monitoring capabilities throughout the interior of the interconnecting network domains.

Page 12: NERSC Site Report

Office of Science

U.S. Department of Energy

Distributed Systems (cont’d)Distributed Systems (cont’d)

• Netlogger (http://www-didc.lbl.gov/NetLogger/)• pyGlobus (http://dsd.lbl.gov/gtg/projects/pyGlobus/)

Python interface to the Globus Toolkit. LIGO gravity wave experiment is using it to replicate TB/day data around the US with the LIGO Data Replicator (http://www.lsc-group.phys.uwm.edu/LDR/)

• DOEGrids.org PKI for the DOE science community, part of federation supporting international scientific collaborations

Page 13: NERSC Site Report

Office of Science

U.S. Department of Energy

Repaired HardwareRepaired Hardware

• System from 2000 wide spread failure (half of 90 systems)

• Had broken systems inspected by LBL Electronics Shop

• Discovered 4 bad capictors (~$2)

• Prep’d systems can be repaired for ~$20/board

• 16 systems repaired so far

• Plan to eventually repair all system from batch