- 1 - grid@large - lisbon, august 29-30 2005a. streit deisa forschungszentrum jülich in der...
Post on 21-Dec-2015
220 views
TRANSCRIPT
- 1 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
DEISADEISA
Forschungszentrum Jülichin der Helmholtz-Gesellschaft
Achim Streit
www.deisa.org
- 2 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
AgendaAgenda
Introduction
SA3: Resource Management
DEISA Extreme Computing Initiative
Conclusion
- 3 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
The DEISA ConsortiumThe DEISA Consortium
DEISA is a consortium of leading national supercomputer centers in Europe
IDRIS – CNRS, France
FZJ, Jülich, Germany
RZG, Garching, Germany
CINECA, Bologna, Italy
EPCC, Edinburgh, UK
CSC, Helsinki, Finland
SARA, Amsterdam, The Netherlands
HLRS, Stuttgart, Germany
BSC, Barcelona, Spain
LRZ, Munich, Germany
ECMWF (European Organization), Reading, UK
Granted by: European Union FP6 Grant period: May, 1st 2004 – April, 30th 2008
- 4 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
DEISA objectivesDEISA objectives
To enable Europe’s terascale science by the integration of Europe’s most powerful supercomputing systems.
Enabling scientific discovery across a broad spectrum of science and technology is the only criterion for success
DEISA is an European Supercomputing Service built on top of existing national services.
DEISA deploys and operates a persistent, production quality, distributed, heterogeneous supercomputing environment with continental scope.
- 5 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Basic requirements and strategies for the Basic requirements and strategies for the DEISA research InfrastructureDEISA research Infrastructure
Fast deployment of a persistent, production quality, grid empowered supercomputing infrastructure with continental scope.
European supercomputing service built on top of existing national services requires reliability and non disruptive behavior.
User and application transparency
Top-down approach: technology choices result from the business and operational models of our virtual organization. DEISA technology choices are fully open.
- 6 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
The DEISA supercomputing Grid: The DEISA supercomputing Grid: A layered infrastructureA layered infrastructure
Inner layer: a distributed super-cluster resulting from the deep integration of similar IBM AIX platforms at IDRIS, FZ-Jülich, RZG-Garching and CINECA (phase 1) then CSC (phase 2). It looks to external users as a single supercomputing platform.
Outer layer: a heterogeneous supercomputing Grid: IBM AIX super-cluster (IDRIS, FZJ, RZG, CINECA, CSC) close to 24 Tf BSC, IBM PowerPC Linux system, 40 Tf LRZ, Linux cluster (2.7 Tf) moving to SGI ALTIX system (33 Tf in 2006, 70
Tf in 2007) SARA, SGI ALTIX Linux cluster, 2.2 Tf ECMWF, IBM AIX system, 32 Tf HLRS, NEC SX8 vector system, close to 10 Tf
- 7 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Logical view of the Logical view of the phase 2 DEISA networkphase 2 DEISA network
DFN
RENATER
GARR
GÈANT
SURFnet
UKERNA
RedIRIS
FUnet
- 8 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
AIX Super-Cluster May 2005AIX Super-Cluster May 2005
CSC
ECMWF
ServicesServices:
High performance datagrid via GPFSAccess to remote files use the fullavailable network bandwidth
Job migration across sitesUsed to load balance the global workflow when a huge partition is allocated to a DEISA project in one site
Common Production Environment
- 9 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Service ActivitiesService Activities
SA1 – Network Operation and Support (FZJ) Deployment and operation of a gigabit per second network infrastructure for an
European distributed supercomputing platform. Network operation and optimization during project activity.
SA2 – Data Management with Global File Systems (RZG) Deployment and operation of global distributed file systems, as basic building
blocks of the “inner” super-cluster, and as a way of implementing global data management in a heterogeneous Grid.
SA3 – Resource Management (CINECA) Deployment and operation of global scheduling services for the European
super-cluster, as well as for its heterogeneous Grid extension. SA4 – Applications and User Support (IDRIS)
Enabling the adoption by the scientific community of the distributed supercomputing infrastructure, as an efficient instrument for the production of leading computational science.
SA5 – Security (SARA) Providing administration, authorization and authentication for a heterogeneous
cluster of HPC systems, with special emphasis on single sign-on.
- 10 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
SA3: A Three Layer ArchitectureSA3: A Three Layer Architecture
Basic services located closest to the operating system of the computing platforms enable the operation of a single or a multiple cluster through local or
extended batch schedulers and other cluster-like features
Intermediate services first-level Grid services that allow access to an enlarged Grid-
empowered infrastructure dealing with resource and network monitoring and information systems
Advanced service use the previous layers to implement the global management of the
distributed resources of the infrastructure
- 11 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Logical LayoutLogical Layout
Hardware
OS and communication
Resource manager
Policies implementation through the scheduler (workload,advance reservation, accounting)
Services:• access• workflow management• co-allocation• brokering• job rerouting• multiple accounting• data staging
- 12 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Gateway 4.1.0 NJS 4.2.0 TSI 4.1.0
J2SE 1.4.2
UNICORE InfrastructureUNICORE Infrastructure
- 13 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Physical LayoutPhysical LayoutResource ManagementResource Management
Power 4
AIX 5.2
LL RM
LL backfill
Power 4
AIX 5.2
LL RM
LL backfill
Power 4
AIX 5.2
LL RM
LL backfill
Power 4
AIX 5.2
LL RM
LL backfill
Power 4
AIX 5.2
LL RM
LL backfill
Power 4
AIX 5.2
LL RM
LL backfill
IA64
RHEL+SGI PP
LSF RM
LSF HPC
IA64
RHEL+SGI PP
SGE RM
SGE
PPC
SUSE
LL RM
LL backfill
SX
NECOS
NEC NQERM
NEC NQE
IDRIS FZJ RZG CINECA CSC ECMWF SARA LRZ BSC HLRS
UNICORE
- 14 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Physical LayoutPhysical LayoutData ManagementData Management
Power 4
AIX 5.2
Power 4
AIX 5.2
Power 4
AIX 5.2
Power 4
AIX 5.2
Power 4
AIX 5.2
Power 4
AIX 5.2
IA64
RHEL+SGI PP
ClientAd Hoc
IA64
RHEL+SGI PP
PPC
SUSE
SX
NECOS
IDRIS FZJ RZG CINECA CSC ECMWF SARA LRZ BSC HLRS
IBM GPFS (General Parallel File System) over WAN
ClientAd Hoc
ClientAd Hoc??
ClientNative
ClientNative
ClientNative
ClientNative
ClientNative
ClientNative
ClientNative
- 15 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
DEISA Supercomputing Grid servicesDEISA Supercomputing Grid services
Workflow management: based on UNICORE plus further extensions and services coming from DEISA’s JRA7 and other projects (UniGrids, …)
Global data management: a well defined architecture implementing extended global file systems on heterogeneous systems, fast data transfers across sites, and hierarchical data management at a continental scale.
Co-scheduling: needed to support Grid applications running on the heterogeneous environment.
Science Gateways and portals: specific Internet interfaces to hide complex supercomputing environments from end users, and facilitate the access of new, non traditional, scientific communities.
- 16 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
CPU GPFS CPU GPFS CPU GPFS CPU GPFS CPU GPFS
+ NRENs
ClientClient
Job
Data
Job-workflow:1) FZJ2) CINECA3) RZG4) IDRIS5) SARA
Workflow Application with UNICOREWorkflow Application with UNICOREGlobal Data Management with GPFSGlobal Data Management with GPFS
- 17 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Resource ManagementResource ManagementInformation System (RMIS)Information System (RMIS)
Deliver up to date and complete resource management information about the grid
Provide relevant information to system administrators from remote sites and to end-users
Our approach performed a implementation-independent system analysis attempted to model the DEISA distributed supercomputer platform
designed to operate the grid identified the resource management part as a sub-system needing to
interface other sub-systems to get relevant information other sub-systems use external tools (monitoring tools, data bases and
batch system) with which we need to interface
- 18 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
ImplementationImplementation
Cluster
Server
Batch Scheduler
Ganglia gmond
MDS2 : static and almost static data
Ganglia gmond
Ganglia gmond
Firewall
RMIS web front-end
Web Server
Configuration files
Gangliagmetad
MDS2back-end
Based on Ganglia monitoring tool coupled with MDS2/Globus The data published have been distinguished in two groups :
static data (MDS2) – refresh time ~ hours or days dynamic data (Ganglia) – refresh time ~ seconds or minutes
Web server based on the Ganglia web front end allows the display of any relevant data from MDS2 or Ganglia
- 19 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Portals (Science Gateways)Portals (Science Gateways)
Same concept as TeraGrid’s Science Gateways
Needed to enhance the outreach of supercomputing infrastructures
Hiding complex supercomputing environments from end users, providing discipline specific tools and support, and moving in some cases towards community allocations.
There is already work done by DEISA on Genomics and Material Sciences portals
Intense brainstorming on the desing of a global strategy, if possible interoperable with TeraGrid’s Science Gateways
- 20 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Enabling scienceEnabling science
Initial, “early users” program: a number of Joint Research Activities integrated in the project from the start.
Moving towards “exceptional users”: the DEISA Extreme Computing Initiative
Activity Scientific program Partners Leader
JRA1 Enabling Material Science, CPMD cods, portals
RZG Hermann Lederer, RZG
JRA2 Computational environment for applications in Cosmology
EPCC Gavin Pringle, EPCC
JRA3 Enabling the TORB Plasma Physics code
RZG Hermann Lederer, RZG
JRA4 Life science: genomic and eHealth Applications
IDRIS, (BSC) Victor Alessanrini, IDRIS BSC
JRA5 CFD in automobile industry CINECA, CRI Roberto Tregnago, CRI
JRA6 Coupled applications: Astrophysics, Combustion, Environment
IDRIS (HLRS) Gilles Grasseau, IDRIS
- 21 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
The Extreme Computing InitiativeThe Extreme Computing Initiative
Identification, deployment and operation of a number of “flagship” applications in selected areas of science and technology
Applications must rely on the DEISA Supercomputing Grid services (application profiles have been clearly defined). They will benefit from exceptional resources from the DEISA pool.
Applications are selected on the basis of scientific excellence, innovation potential, and relevance criteria.
European call for proposals: April 1st May 30, 2005
- 22 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
Evaluation and allocation of DEISA Evaluation and allocation of DEISA resourcesresources
National evaluation committees evaluate the proposals and determine priorities.
On the basis of this information, the DEISA consortium examines how the applications map to the resources available in the DEISA pool, and negotiates internally the way the resources will be allocated and the final priorities for projects.
Exceptional DEISA resources will be allocated – as in large scientific instruments – at well defined time windows (to be negotiated with the users).
- 23 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
DDEISA EISA EExtreme xtreme CComputing omputing IInitativenitativeDECIDECI
Call for Expressions of Interest / Proposals in April and May 2005 50 proposals submitted Requested CPU time: 32 million CPU-hr European countries involved
Finland, France, Germany, Greece, Hungary, Italy, Netherlands, Russia, Spain, Sweden, Switzerland, UK
Proposals Materials Science, Quantum Chemistry, Quantum Computing: 16 Astrophysics (Cosmology, Stars, Solar Sys.): 13 Life Sciences, Biophysics, Bioinformatics: 8 CFD, Fluid Mechanics, Combustion: 5 Earth Sciences, Climate Research: 4 Plasma Physics: 2 QCD, Particle Physics, Nuclear Physics: 2
- 24 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
ConclusionsConclusions
DEISA adopts Grid technologies to integrate national supercomputing infrastructures, and to provide an European Supercomputing Service.
Service activities are supported by the coordinated action of the national center's staffs. DEISA operates as a virtual European supercomputing centre.
The big challenge we are facing is enabling new, first class computational science.
Integrating leading supercomputing platforms with Grid technologies creates a new research dimension in Europe.
- 25 -GRID@Large - Lisbon, August 29-30 2005GRID@Large - Lisbon, August 29-30 2005
A. Streit A. Streit
October 11–12, 2005ETSI Headquarters, Sophia Antipolis, France
http://summit.unicore.org/2005
In conjunction withGrids@work: Middleware, Components, Users, Contest and Plugtests
http://www.etsi.org/plugtests/GRID.htm
Supported by