gaiagrid – a three year experience

21
S.G. Ansari S.G. Ansari July 3, 2022 July 3, 2022 Grid Grid GaiaGrid – A three Year GaiaGrid – A three Year Experience Experience Salim Ansari Salim Ansari Toulouse Toulouse 20 20 th th October, 2005 October, 2005

Upload: blair-eaton

Post on 03-Jan-2016

27 views

Category:

Documents


2 download

DESCRIPTION

GaiaGrid – A three Year Experience. Salim Ansari Toulouse 20 th October, 2005. Why Grid?. The GDAAS Study had underestimated the necessary computational power to carry out the Gaia Data Analysis prototype. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GaiaGrid – A three Year Experience

S.G. AnsariS.G. AnsariApril 20, 2023April 20, 2023

GridGrid

GaiaGrid – A three Year GaiaGrid – A three Year ExperienceExperience

Salim AnsariSalim Ansari

ToulouseToulouse2020thth October, 2005 October, 2005

Page 2: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Why Grid?Why Grid?

The GDAAS Study had underestimated the The GDAAS Study had underestimated the necessary computational power to carry out the necessary computational power to carry out the Gaia Data Analysis prototype.Gaia Data Analysis prototype.

The number of parallel activities spun out of The number of parallel activities spun out of control, as algorithm providers began delivering control, as algorithm providers began delivering algorithms that could not be implemented on the algorithms that could not be implemented on the limited infrastructure dedicated to GDAASlimited infrastructure dedicated to GDAAS

A clear need for a collaborative environment was A clear need for a collaborative environment was inevitableinevitable

Page 3: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

ObjectivesObjectives

1. to increase computational power whenever and wherever needed at low cost

2. to provide a framework of developing Shell task algorithms for Gaia and

3. to establish a collaborative environment, where the community may share and exchange results

Page 4: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

ConstraintsConstraints

Moto: Low cost, high return on investmentMoto: Low cost, high return on investment Low cost hardware budget: reusability of low end Low cost hardware budget: reusability of low end

PC’sPC’s Small investment in industrial effort: [0.5 FTE]Small investment in industrial effort: [0.5 FTE] System Administration: 1 junior staff + System Administration: 1 junior staff +

maintenance [1 FTE]maintenance [1 FTE]

Page 5: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Core vs. Shell TasksCore vs. Shell Tasks

Core TasksCore Tasks: : Initial Data TreatmentInitial Data Treatment Global Iterative SolutionGlobal Iterative Solution Cross-correlations Cross-correlations

Acts upon the totality of the dataActs upon the totality of the data

Shell TasksShell Tasks:: ClassificationClassification Photometric analysisPhotometric analysis Spectroscopic analysisSpectroscopic analysis

Any data analysis involving Any data analysis involving remote expertise and acts remote expertise and acts upon a portion of the data at upon a portion of the data at a timea time

Centralised

As a result of the GDAAS Study, two categories of algorithms had been established:

Page 6: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Gaia Virtual Organisation June 2005Gaia Virtual Organisation June 2005

Page 7: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

The Processing ScopeThe Processing ScopeMichael Perryman, GAIA-MP-009, 17 August 2004, Version 1.1

TaskTask Processing Power Processing Power in totalin total

Duration Duration

[1.2 Teraflop machine]*[1.2 Teraflop machine]*

Core TasksCore Tasks 40 × 1018 FLOPs 385 days CPU time on the target ‘2012 machine’

Shell TasksShell Tasks 90 × 1018 FLOPs 880 days CPU time on the target ‘2012 machine’

TOTALTOTAL 1021 FLOPs Assuming factor 10 in uncertaintyAssuming factor 10 in uncertainty

Top TasksTop Tasks- GIS processing: 125 days (CPU processing on 2012 machine)- first-look: 125 days (assumed equal to GIS at present)- spectro PSF fitting: 71 days- variability period: 33 days- various DMS classes: 60 days- DMS: ASM analysis- multiples: ASM

* Assuming a 40 GFlop machine today extrapolated to 2012 with Moore’s Law* Assuming a 40 GFlop machine today extrapolated to 2012 with Moore’s Law

Page 8: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

The first monthsThe first months

Setting up the hardware and nodes was easy Setting up the hardware and nodes was easy and took 2 man monthsand took 2 man months

Globus was installed on:Globus was installed on: ESTEC nodesESTEC nodes ESRIN was already up and runningESRIN was already up and running CESCA node in BarcelonaCESCA node in Barcelona ULB node in BrusselsULB node in Brussels ARI node in HeidelbergARI node in Heidelberg

GridAssist tool was identified as a potential GridAssist tool was identified as a potential workflow toolworkflow tool

Page 9: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Task distribution on GaiaGridTask distribution on GaiaGrid

GDAAS DB

Gaia Simulator

Core Processing

GridAssist Controller

Data Access Layer

Globus NodeShell TaskGlobus NodeShell Task

Initial Data Treatment } Barcelona

ESTEC

ULB

ESRIN

Page 10: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Current InfrastructureCurrent Infrastructure

9 Infrastructures in 7 countries (voluntary)9 Infrastructures in 7 countries (voluntary)[51 CPUs]:[51 CPUs]: ESTEC [14 CPUs] (SCI-CI) + 1 Gigabit dedicated link to SurfnetESTEC [14 CPUs] (SCI-CI) + 1 Gigabit dedicated link to Surfnet ESAC [ 4 CPUs] (SCI-SD) + 8 Mb link to REDIRISESAC [ 4 CPUs] (SCI-SD) + 8 Mb link to REDIRIS ESRIN [16 CPUs] (EOP) + 155 Mb link to GARRESRIN [16 CPUs] (EOP) + 155 Mb link to GARR CESCA [ 5 CPUs] (Barcelona) + REDIRIS connectivityCESCA [ 5 CPUs] (Barcelona) + REDIRIS connectivity ARI [ 2 CPUs] (Heidelberg) + Academic backboneARI [ 2 CPUs] (Heidelberg) + Academic backbone ULB [ 1 CPU] (Brussels) + Academic backboneULB [ 1 CPU] (Brussels) + Academic backbone DutchSpace [7 CPU] (Leiden) + Commercial linkDutchSpace [7 CPU] (Leiden) + Commercial link IoA [1 CPU] (Cambridge) + Academic BackboneIoA [1 CPU] (Cambridge) + Academic Backbone UGE [1 CPU] (Geneva) + Academic BackboneUGE [1 CPU] (Geneva) + Academic Backbone

2 Data Storage Elements:2 Data Storage Elements: CESCA [5 Terabytes]CESCA [5 Terabytes] ESTEC [2 Terabytes]ESTEC [2 Terabytes] ESAC [upto 4 Terabytes)ESAC [upto 4 Terabytes)

The current infrastructure has been created on an experimental basis and should The current infrastructure has been created on an experimental basis and should not yet be considered part of an operational environmentnot yet be considered part of an operational environment

Page 11: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Current ApplicationsCurrent Applications

Gaia SimulatorGaia Simulator Astrometric Binary Star Shell Task Astrometric Binary Star Shell Task Variability Star Analysis Shell TaskVariability Star Analysis Shell Task RVS Cross Correlation Shell TaskRVS Cross Correlation Shell Task

Page 12: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Global Gaia Data ProcessingGlobal Gaia Data Processing

Page 13: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

The GridAssist ClientThe GridAssist ClientPerformance Grid ComputationPerformance Grid Computation

Heidelberg

Rome

Leiden

Barcelona

Page 14: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

The GridAssist ClientThe GridAssist ClientDistributed Grid ComputationDistributed Grid Computation

Barcelona

Brussels

Noordwijk

Page 15: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

ResultsResults

Gaia Simulator profited tremendously from GaiaGrid, Gaia Simulator profited tremendously from GaiaGrid, which accelerated the simulations of the Astrometric which accelerated the simulations of the Astrometric Binary Stars. This would have otherwise needed to be Binary Stars. This would have otherwise needed to be scheduled on a single infrastructure at CESCA, which scheduled on a single infrastructure at CESCA, which was at the same time running GDAAS tasks.was at the same time running GDAAS tasks.

The Astrometric Binary Star Analysis for a single HTM cell The Astrometric Binary Star Analysis for a single HTM cell (383 systems) is down to 15 minutes (and falling) on 2 (383 systems) is down to 15 minutes (and falling) on 2 infrastructures from a single CPU in Brussels, which infrastructures from a single CPU in Brussels, which was taking 3 hours.was taking 3 hours.

Page 16: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Possible Implementation:Possible Implementation:The Gaia Collaboration EnvironmentThe Gaia Collaboration Environment

Binary Star Analysis

Variable Star Analysis

Radial Velocity Cross Correlations

Photometric Analysis

Classification

GaiaLib

Core Interface

Core Interface

Binary Star Analysis

Variable Star Analysis

Classification

Gaia Data Results

The Gaia Community would develop, analyse and update the data transparently, without having any notion of where each component is running, or have to worry about CPU and storage limitatons.

Page 17: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Security IssuesSecurity Issues

All ESTEC and ESRIN Grid machines lie outside All ESTEC and ESRIN Grid machines lie outside the ESA firewallthe ESA firewall

Security is controlled via ESA Grid certificationSecurity is controlled via ESA Grid certification Currently no distinction is made between Currently no distinction is made between

projects (e.g. GaiaGrid and PlanckGrid.)projects (e.g. GaiaGrid and PlanckGrid.) The GridAssist tool provides basic functionality The GridAssist tool provides basic functionality

to distinguish an administrator (person who may to distinguish an administrator (person who may add/remove sources) from a workflow user.add/remove sources) from a workflow user.

Page 18: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

CertificationCertification

Certification Authority for ESA Grid lies currently Certification Authority for ESA Grid lies currently with ESTEC (SCI-C)with ESTEC (SCI-C)

This is under review in light of higher-level This is under review in light of higher-level discussions within EIROForum Grid Groupdiscussions within EIROForum Grid Group

Page 19: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Future ActivitiesFuture Activities

The GaiaGrid environment is available to anyone wishing The GaiaGrid environment is available to anyone wishing to experiment with parallelization and distribution of tasksto experiment with parallelization and distribution of tasks

In the current Gaia Data Processing framework, the In the current Gaia Data Processing framework, the environment can only be used as standalone. environment can only be used as standalone.

The possibility of using the Grid environment to also The possibility of using the Grid environment to also carry out some core tasks is being investigated.carry out some core tasks is being investigated.

GaiaGrid can be considered the testbed for all GaiaGrid can be considered the testbed for all algorithms under developmentalgorithms under development

Page 20: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

ConclusionsConclusions

GaiaGrid has demonstrated that it is easy to GaiaGrid has demonstrated that it is easy to setup a Grid environment.setup a Grid environment.

GaiaGrid has also demonstrated the GaiaGrid has also demonstrated the collaborative capabilities by allowing the sharing collaborative capabilities by allowing the sharing of results amongst multiple institutesof results amongst multiple institutes

The deployment of the Gaia Simulator has led The deployment of the Gaia Simulator has led programmer to think more “portable”programmer to think more “portable”

Page 21: GaiaGrid – A three Year Experience

S.G. AnsariS.G. Ansari

GridGrid

April 20, 2023April 20, 2023

Lessons learnedLessons learned

The The developmentdevelopment of Gaia algorithms is a task that involves a community of of Gaia algorithms is a task that involves a community of people dispersed across Europepeople dispersed across Europe

No single groupNo single group should believe that they can implement all of these should believe that they can implement all of these algorithms without the proper support by the communityalgorithms without the proper support by the community

A sound A sound collaborationcollaboration environment is essential to ensure that everyone in environment is essential to ensure that everyone in a single community has a common understanding of the problematics.a single community has a common understanding of the problematics.

ProcessingProcessing is cheap and the technology is simple, but cumbersome to is cheap and the technology is simple, but cumbersome to maintain. Each shell task has to be installed on all the Grid machines used maintain. Each shell task has to be installed on all the Grid machines used in any in any Virtual OrganisationVirtual Organisation. .

There is There is no magicno magic to Grid! to Grid! The main hurdles in Grid involve The main hurdles in Grid involve security and certificationsecurity and certification. Who should be . Who should be

allowed to run jobs on my machine(s)?allowed to run jobs on my machine(s)? Grid should always be considered as Grid should always be considered as “added value”,“added value”, but should not be but should not be

considered within the scope of day-to-day operations like the data considered within the scope of day-to-day operations like the data processing in Gaia (if it becomes that, you have underestimated the effort of processing in Gaia (if it becomes that, you have underestimated the effort of carrying out your project and should review your internal resources for the carrying out your project and should review your internal resources for the long term.)long term.)