al geist august 17-19, 2005 oak ridge, tn
DESCRIPTION
Working Group updates, Suite Tests and Scalability, Race conditions, SSS-OSCAR Releases and Hackerfest. Al Geist August 17-19, 2005 Oak Ridge, TN. Welcome to Oak Ridge National Lab! First quarterly meeting here. Demonstration: Faster than Light Computer. Able to calculate the answer - PowerPoint PPT PresentationTRANSCRIPT
Working Group updates, Suite Tests and Scalability, Race conditions,
SSS-OSCAR Releases and Hackerfest
Working Group updates, Suite Tests and Scalability, Race conditions,
SSS-OSCAR Releases and Hackerfest
Al GeistAugust 17-19, 2005
Oak Ridge, TN
Welcome to Oak Ridge National Lab!First quarterly meeting here
Welcome to Oak Ridge National Lab!First quarterly meeting here
Demonstration: Demonstration: Faster than Light ComputerFaster than Light Computer
Able to calculate the answer before the problem is specified
Faster than Light ComputerFaster than Light Computer
ORNLORNLORNLORNL ORNLORNL ORNLORNL ORNLORNL
IBMCrayIntelSGI
Scalable Systems SoftwareScalable Systems Software
Participating Organizations
ORNLANLLBNLPNNL
NCSAPSC
SNLLANLAmes
• Collectively (with industry) define standard interfaces between systems components for interoperability
• Create scalable, standardized management tools for efficiently running our large computing centers
Problem
Goals
• Computer centers use incompatible, ad hoc set of systems tools
• Present tools are not designed to scale to multi-Teraflop systems
ResourceManagement
Accounting& user mgmt
SystemBuild &Configure
Job management
SystemMonitoring
www.scidac.org/ScalableSystems
To learn more visit
Grid Interfaces
Accounting
Event Manager
ServiceDirectory
MetaScheduler
MetaMonitor
MetaManager
SchedulerNode StateManager
AllocationManagement
Process Manager
UsageReports
Meta Services
System &Job Monitor
Job QueueManager
NodeConfiguration
& BuildManager
Standard XML
interfacesauthentication communication
Components written in any mixture of C, C++, Java, Perl, and Python can be integrated into the Scalable Systems Software Suite
Checkpoint /Restart
Validation & Testing
HardwareInfrastructure
Manager
SSS-OSCAR
Scalable Systems Software SuiteScalable Systems Software SuiteAny Updates to this diagram?Any Updates to this diagram?
Components in SuitesComponents in Suites
Gold
EM
SD
Grid scheduler
Warehouse MetaManager
Mauisched
NSM
Gold
PM
UsageReports
Meta Services
Warehouse(superMonNWPerf)
BambooQM
BCM
Multiple Component
Implementations exits
ssslib
BLCR
APITest
HIMCompliant with PBS, Loadlever
job scripts
Scalable Systems UsersScalable Systems Users
Production use today:• Running an SSS suite at ANL, and Ames• ORNL industrial cluster (soon)• Running components at PNNL• Maui w/ SSS API (3000/mo), Moab (Amazon, Ford, TeraGrid, …)
Who can we involve before the end of the project?- National Leadership-class facility?
NLCF is a partnership between ORNL (Cray), ANL (BG), PNNL (cluster)
- NERSC and NSF centersNCSA cluster(s)NERSC cluster?NCAR BG
Goals for This MeetingGoals for This Meeting
Updates on the Integrated Software Suite components Change in Resource Management Group
Scott Jackson left PNNL
Planning for SciDAC phase 2 – discuss new directions
Preparing for next SSS-OSCAR software suite release What needs to be done at hackerfest?
Getting more outside Users. Production and feedback to suite
Since Last MeetingSince Last Meeting
• FastOS Meeting in DC• Any chatting about leveraging our System Software?
• SciDAC 2 Meeting in San Francisco• Scalable Systems Poster• Talk on ISICS• Several SSS members there. Anything to report?
• Telecoms and New entries in Electronic Notebooks • Pretty sparse since last meeting
Agenda – August 17Agenda – August 17
8:00 Continental Breakfast CSB room B2268:30 Al Geist - Project Status 9:00 Craig Steffen – Race Conditions in Suite 9:30 Paul Hargrove Process Management and Monitoring
10:30 Break 11:00 Todd Kordenbrock – Robustness and Scalability Testing
12:00 Lunch (on own at cafeteria )
1:30 Brett Bode - Resource Management components 2:30 Narayan Desai - Node Build, Configure, Cobalt status
3:30 Break 4:00 Craig Steffen – SSSRMAP in ssslib 4:30 Discuss proposal ideas for SciDAC 24:30 Discussion of getting SSS users and feedback5:30 Adjourn for dinner
Agenda – August 18Agenda – August 18
8:00 Continental Breakfast 8:30 Thomas Naughton - SSS OSCAR software releases 9:30 Discussion and voting
• Your name here 10:30 Group discussion of ideas for SciDAC-2.11:30 Discussion of Hackerfest goals
Set next meeting date/location: 12:00 Lunch (walk over to cafeteria)
1:30 Hackerfest begins room B226 3:00 Break 5:30 or whenever break for dinner
Agenda – August 19Agenda – August 19
8:00 Continental Breakfast 8:30 Hackfest continues
12:00 Hackerfest ends
What is going on inSciDAC 2
Executive Panel
Five Workshops in past 5 wksPreparing a SciDAC 2 program plan
at LBNL today!
ISIC section has words about system software and tools
What is going on inSciDAC 2
Executive Panel
Five Workshops in past 5 wksPreparing a SciDAC 2 program plan
at LBNL today!
ISIC section has words about system software and tools
View to the FutureView to the Future
HW, CS, and Science Teams all contribute to the science breakthroughs
Leadership-class Platforms
BreakthroughScience
Software & Libs
SciDAC CS teamsTuned codes Research
teamHigh-End
science problem
Computing EnvironmentCommon look&feel across diverse HW
UltrascaleHardware
Rainer, Blue Gene, Red Storm OS/HW teams
SciDACScience Teams
SciDAC Phase 2 and CS ISICsSciDAC Phase 2 and CS ISICs
Future CS ISICs need to be mindful of needs of
National Leadership Computing facility w/ Cray, IBM BG, SGI, clusters, multiple OSNo one architecture is best for all applications
SciDAC Science TeamsNeeds depend on application areas chosenEnd stations? Do they have special SW needs?
FastOS Research ProjectsComplement, don’t duplicate these efforts
Cray software roadmapMaking the Leadership computers usable, efficient, fast
Gaps and potential next stepsGaps and potential next steps
Heterogeneous leadership-class machines science teams need to have a robust environment that presents similar programming interfaces and tools across the different machines.
Fault tolerance requirements in apps and systems softwareparticularly as systems scale up to petascale around 2010
Support for application users submitting interactive jobs
computational steering as means of scientific discovery
High performance File System and I/O research
increasing demands of security, scalability, and fault tolerance
Security
One-time-passwords and impact on scientific progress
Heterogeneous MachinesHeterogeneous Machines
Heterogeneous Architectures Vector architectures, Scalar, SMP, Hybrids, Clusters
How is a science team to know what is best for them?
Multiple OSEven within one machine, eg. Blue Gene, Red Storm
How to effectively and efficiently administer such systems?
Diverse programming environment
science teams need to have a robust environment that presents similar programming interfaces and tools across the different machines
Diverse system management environmentManaging and scheduling multiple node types
System updates, accounting, … everything will be harder in round 2
Fault ToleranceFault Tolerance
Holistic Fault Tolerance Research into schemes that take into account the full impact of faults:
application, middleware, OS, and hardware
Fault tolerance in systems software• Research into prediction and prevention• Survivability and resiliency when faults can not be avoided
Application recovery
• transparent failure recovery
• Research into Intelligent checkpointing based on active monitoring, sophisticated rule-based recoverys, diskless checkpointing…
• For petascale systems research into recovery w/o checkpointing
Interactive ComputingInteractive Computing
Batch jobs are not the always the best for Science Good for large numbers of users, wide mix of jobs, but
National Leadership Computing Facility has different focus
Computational Steering as a paradigm for discoveryBreak the cycle: simulate, dump results, analyze, rerun simulation
More efficient use of the computer resources
Needed for Application developmentScaling studies on terascale systems
Debugging applications which only fail at scale
File System and I/O ResearchFile System and I/O Research
Lustre is today’s answer There are already concerns about its capabilities as systems scale up to 100+ TF
What is the answer for 2010?Research is needed to explore the file system and I/O requirements for petascale systems that will be here in 5 years
I/O continues to be a bottleneck in large systems
Hitting the memory access wall on a node
To expensive to scale I/O bandwidth with Teraflops across nodes
Research needed to understand how to structure applications or modify I/O to allow applications to run efficiently
SecuritySecurity
New stricter access policies to computer centers Attacks on supercomputer centers have gotten worse.
One-Time-Passwords, PIV?Sites are shifting policies, tightening firewalls, going to SecureID tokens
Impact on scientific progressCollaborations within international teams
Foreign nationals clearance delays
Access to data and computational resources
Advances required in system softwareTo allow compliance with different site policies and be able to handle tightest requirements
Study how to reduce impact on scientists
Meeting notesMeeting notes
Al Geist – see slidesCraig Steffen – Exciting new race condition!Nodes go offline – Warehouse doesn’t know quick enoughEvent manager, scheduler, lots of components affectedProblem grows linear with system size. COrder of operations need to be considered – something we haven’t considered before. Issue can be reduced, can’t be solvedGood discussion on ways to reduce race conditions.
SSS use at NCSAPaul Egli rewrote Warehouse – many new features added, Sandia userNow monitoring sessionsAll configuration is dynamicMultiple debugging channelsSandia user – tested to 1024 virtual nodesWeb site – http://arrakis.ncsa.uiuc.edu/warehouse/New hire full time on SSSLining up T2 scheduling (500 proc)
Meeting notesMeeting notes
Paul Hargrove – Checkpoint Manager BLCR statusAMD64/EM64T port now in beta (crashes some users machines)Recently discovered kernel panic during signal interaction (must fix at hackerfest)Next step process groups/sessions – begin next weekLRS-XML and Events “real soon now”Open MPI chpt/restart support by SC2005Torque integration done at U. Mich. for phd thesis (needs hardening)Process manager – MPD rewrite “refactoring” Getting a PM stable and working on BG.Todd K – Scalability and Robustness testsESP2 Efficiency ratio 0.9173 on 64 nodesScalability – Bamboo 1000 job submission
Gold (java version) – reservation slow – PERL version not testedWarehouse – up to 1024 nodesMaui on 64 nodes (need more testing)
Durability – Node Warm stop – 30 seconds to Maui notificationNode Warm start – 10 secondsNode Cold stop – 30 seconds
Meeting notesMeeting notes
Todd K – testing continuedSingle node failure – goodResource hog (stress)Resource exhaustion – service node (Gold fails in logging package)Anomalies
MauiWarehouseGoldhappynsm
ToDoTest BLCR moduleRetest on larger clusterGet latest release of all software and retestWrite report on results.
Meeting notesMeeting notes
Brett Bode – RM statusNew release of componentsBamboo v1.1Maui 3.2.6p13Gold 2b2.10.2Gold being used on Utah clusterSSS suite on several systems at AmesNew fountain component – to front end Supermon, ganglia, etc.Demos new tool called Goanna for looking at fountain outputHas same interface as Warehouse – could plug right inGeneral release of GOLD 2.0.0.0 available. New perl cgi gui
no Java dependency at all in GoldX509 support in Mcom (for Maui and Silver)Cluster scheduler bunch of new featuresGrid scheduler – enabled basic accounting for grid jobs.Future work – Gary needs to get up to speed on Gold code
make it all work with LRS
Meeting notesMeeting notes
Narian – LRS Conversion statusAll components in center cloud converted to LRS
Service directory, Event manager, BCM stack, Processor ManagerTargeted for SC05 releaseSSSlib changeover – completedSDK support – completed
Cobalt OverviewSSS suite on Chiba and BGMotivations – scalability, flexibility, simplicity, support for research ideasTools included: parallel programming toolsPorting has been easy – now running on Linux, MacOS, and BG/LOnly about 5K lines of code. Targeted for Cray xt3, x1, zeptoOSUnique features- small partition support on BG/L, OS Spec supportAgile – swap out components. User and admin requests easier to satisfyRunning on ANL and NCAR (evaluation at other BG sites)May be running on JAZZ soon.Future- better scheduler, new platforms, more front ends, better docs