a science discovery portal for euclid euclid scientific ......greenplum. open source now used as db...

27
ESA UNCLASSIFIED - For Official Use Sara Nieto on behalf of ESAC Science Data Centre (ESDC) Euclid Science Operations Centre (SOC) A Science Discovery Portal for Euclid Euclid Scientific Archive System

Upload: others

Post on 11-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use

Sara Nieto on behalf of

ESAC Science Data Centre (ESDC)Euclid Science Operations Centre (SOC)

A Science Discovery Portal for EuclidEuclid Scientific Archive System

Page 2: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 2

Euclid Goals

• Discover the origin of the Universe'saccelerating expansion.

• Discover the nature of 95% of theUniverse: dark energy and dark matter.

• Measure shapes of galaxies distorted bygravitational deflection due to dark matter.

• Measure non-random distribution ofgalaxies resulting from the action of gravity

Dark Matter 20%

Dark Energy76%

Ordinary Matter 4%

Page 3: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 3

Euclid Mission Overview

• Launch in Q2 2022 and 6 years duration• 1.2m telescope, L2 orbit• Visible-light camera and a near-infrared

camera/spectrometer

• ESA is responsible for the mission • The Euclid Consortium (EC) will supply ESA with the instruments and most of the SGS• Euclid Consortium & Other teams

• 15 countries, 130 institutes, 1300 consortium members

Page 4: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 4

Euclid Archive System Overall Architecture

SAS Components:SAS-MAL: Metadata Access ServiceSAS-MDR: Metadata RepositorySAS-MTS: Metadata Transfer ServiceSAS-AUS: Archive User ServicesSAS-CLI: Command Line InterfaceSAS-GUI: Graphical User InterfaceSEDM: Science Exploitation Data Model

Talk on DPS and DSSBy Rees Williams onTuesday at 17:00

Page 5: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 5

Euclid Science

Images Catalogues

Spectrasky.esa.int gea.esac.esa.intTalk on Gaia datasetsby Juan Gonzalez onWednesday at 13:45

nxsa.esac.esa.int

LE3

Page 6: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 6

ESA Science Archives Volume Evolution (2000-2030)

Page 7: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 7

Euclid Science Volume and Data Releases• 10PB of data, ~45000 observations in 6 years mission• Wide survey (15000 deg2) and Deep survey (40 deg2)

• Catalogue: ~1.50PB• MER: 50TB• SPE columns: 244TB• PHZ columns: 188TB• SHE columns: 1PB

• Imaging: ~5.5PB• VIS & NIR: 3.5PB• MER: 2PB

• Spectra: ~3.2PB• Other archive products, HiPS maps• External data (3-5PB) KiDS, etc.

• Mission data processing could generate 26 PB/year

0

2000

4000

6000

8000

10000

12000

DR1DR2

DR3

Data Volume of Euclid DRs (TB)

Catalogues Pixel data Spectra

2024 20262029

Page 8: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 8

The Euclid Science Archive System

WEB Portal

VO Compliant APIs

MPP DB Petascale Data Storage

Scientific Community

TAP+ DataLink SIA SODA

Science Exploitation Platform

VOSpace

open source

SSA

Ingestion Layer

Page 9: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 9

SAS v0.9: Euclid HiPSized simulations

Page 10: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 10

SAS v0.9: External data, DES HiPS maps

Page 11: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 11

SAS v0.9: Browse simulated observations

Page 12: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 12

SAS v0.9: ADQL Query Interface

Page 13: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 13

SAS Web Portal v0.9: Overlay of query results

Query Results

Page 14: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 14

SAS Web Portal v0.9: Mosaics

Page 15: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 15

SAS Web Portal v0.9: Footprints of observations

Page 16: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 16

SAS Data, Service and Client Layers

DATACatalogues- EUC Flagship 2.7B- KiDS DR#4 100M- DES Lepus 3.5M

Images and HiPS maps- SC#3- DES, DES Lepus- KiDS DR#4

SC#456 new data

Footprints- Observations- Mosaics

Services- EuclidSky- Overlay of sources- Overlay of user results

- Catalogue searches- Metadata searches- Download service

- Cutout of images- Plotting service

- Spectra- DPS Interface- SEPP interface- X-match

ClientWEB Portal- EuclidSky explorer

- HiPS maps- Catalogues- Footprints

- Products browser

- Plotting browser

- Spectra visualizer- AstroQuery modules

- TAP+- Cutout

- JupyterLab

SASV0.10

Page 17: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 17

Petascale Infrastructure

• GreenPlum open source now used as DB system in SAS replacing PostgreSQL• 10 VM Nodes of 128GB Master and 32GB Workers with 2TB size/node NetApp

• Catalogue of ~3 Billion sources,~ 100 columns

• Current NetApp NFS storage

• Ceph PoC for scaling out ESDC storage• Massive scalable (to Exa-Bytes)• Highly reliable• Easy to manage• Open source

• Apache Spark PoC

Page 18: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 18

Move code to the data: ESA SEPP

Integration

Discovery

Data

Analysis

Application

Workspace

Interoperability

Pipeline

User Management

Logging & Monitoring Registry Resource

Management

Test System Engineering

Storage Infrastructure

Computing Infrastructure

Portal Desktop Apps

Talk By Vicente Navarro on Tuesday at 16:30

Page 19: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 19

Questions

Thanks for your attention

Page 20: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 20

Additional information

Page 21: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 21

Science Archive participation in Euclid Challenges

• The Scientific Archive is officially participating Euclid Challenges since #3

• Challenges roadmap up to launch in 2022• 3 deg2 (9 fields)• 30 deg2 (~109 fields) on going• 300 deg2 wide (~1000 fields)• 3000 deg2 wide (~10000 fields)

• Internal Data Releases after each challenge• Switch of Data Processing System @ESAC

as Master in 20200

200400600800

10001200140016001800

SC456 SC7 SC8

Challenge Data Volume (TB)

Catalogues Pixel data Spectra

Page 22: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 22

Apache Spark Proof of Concept

• Spark: distributed parallel processing framework• Use cases: Analysis of large catalogues and pixel data (source extraction) 6-

• 6 VM workers of 32GB RAM • 8 cores in standalone mode • NetApp (NFS) storage

• Spark ecosystem >> Astronomy• Example: AXS (latest ADASS)

Page 23: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 23

Courtesy of EC

Page 24: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 24

Euclid Simulations Data Flow

Courtesy of EC & ESA

Page 25: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 25

Courtesy of EC

Page 26: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 26

SAS Component Diagram

Page 27: A Science Discovery Portal for Euclid Euclid Scientific ......GreenPlum. open source now used as DB system in SAS replacing PostgreSQL • 10 VM Nodes of 128GB Master and 32GB Workers

ESA UNCLASSIFIED - For Official Use ESA | 06/10/2019 | Slide 27

Interoperability VO Driven• SEDM based on VODM Standards:

• ObsCore, ProvenanceDM, SourceDM ?• TAP+ (Table Access Protocol)• ADQL (Astronomical Data Query Lang.)• UWS (Universal Worker Service)• VOSpace (Virtual Observatory space)• HiPS (Hierarchical Progressive Survey)• SODA (Server-side Operations Data Access)• SIAP (Simple Image Access Prot.)• DataLink• Euclid SEDM evolves as of ECDM• SEDM v0.9 is based on ECDM 1.8.2