idia pipelines - university of cape town

20
IDIA Pipelines Bradley Frank (SARAO / IDIA) Srikrishna Sekhar, Jordan Collier, David Aikema Russ Taylor, Sourabh Paul [email protected]

Upload: others

Post on 26-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IDIA Pipelines - University of Cape Town

IDIA PipelinesBradley Frank (SARAO / IDIA)

Srikrishna Sekhar, Jordan Collier, David AikemaRuss Taylor, Sourabh Paul

[email protected]

Page 2: IDIA Pipelines - University of Cape Town

IDIA Pipelines• Initiated in 2017 — response to IDIA Call for Projects.

• Umbrella Programme for Science Data Processing at IDIA.

• Data transport and management (with SARAO).

• Provision of Software (via Singularity Containers).

• Design, use and testing of Virtual Machine infrastructures.

• Fat Nodes vs Clusters.

• Prototype Astronomer User Models.

Page 3: IDIA Pipelines - University of Cape Town

Control

Data

Head Node

Worker Nodes

Fat Node

Control

Control

Distributed Storage BeeGFS / CEPH

High-Speed Network Mount

Software: Singularity Containers Workflow/Resource on VHPC: SLURM

General User Model

Page 4: IDIA Pipelines - University of Cape Town

MeerKAT Data• Level 0: Raw (32k, 0.5s dump).

• 8-hr full-pol full-res dataset ~ 20TB.

• Difficult to move — can we do pre-processing?

• HPC CASA at the CHPC.

• Level 1: Calibrated/Imaged at KAPB.

• SDP Pipeline: Corrected Data, Flags + Calibration Tables.

• Potentially smaller volume/averaged?

• Initially meant for QA purposes.

• Can we use L1 for science?

• DQA.

Page 5: IDIA Pipelines - University of Cape Town

The Data Flow

*Or any other 3rd-Party Data Centre (in theory).

X

MeerKAT

Correlator SDP Cal (L0) (Site/Karoo)

Archive (L0+L1) (CHPC/Cape Town)

Data Transfer Node(s)

(CHPC/Cape Town)

100Gb/s

10/40/100 Gb/s

IDIA*

Page 6: IDIA Pipelines - University of Cape Town

Data TransferCurrent State

X

MeerKAT

Correlator SDP Cal (L0) (Site/Karoo)

Archive (L0+L1) (CHPC/Cape Town)

Data Transfer Node(s)

(CHPC/Cape Town)

100Gb/s

10/40/100 Gb/s

IDIA*

• AOD Informs PI that Data is Ready

• PI instructs AOD to trigger push to DTN

• AOD/Dev confirms arrival of data on DTN.

• AOD/Dev contacts IDIA to initiate pull.

• Data arrives.• PI is informed.

Page 7: IDIA Pipelines - University of Cape Town

Data TransferCurrent State

• IDIA/SARAO Data Transfer Node.

• Raw data scraped off S3 database (Rados + NPY Array).

• Converted to MS -> DTN.

• DTN push request initiated.

• GridFTP Transfer queued (managed by FTS).

• Received at IDIA cluster.

Page 8: IDIA Pipelines - University of Cape Town

Data TransferIn Development

X

MeerKAT

Correlator SDP Cal (L0) (Site/Karoo)

Archive (L0+L1) (CHPC/Cape Town)

Data Transfer Node(s)

(CHPC/Cape Town)

100Gb/s

10/40/100 Gb/s

IDIA*

• PI checks archive interface (via VPN) for data.• IDIA affiliated PI can Push-To-IDIA (its a button).• MS data is transferred directly to appropriate IDIA directory.• Transfer progress can be monitored on archive dashboard.

Page 9: IDIA Pipelines - University of Cape Town

Data TransferIn Development with SARAO

Screenshot Courtesy of Chris Schollar

Page 10: IDIA Pipelines - University of Cape Town

Data TransferIn Development with SARAO

Screenshot Courtesy of Chris Schollar

Page 11: IDIA Pipelines - University of Cape Town

Data TransferIn Development

Screenshot Courtesy of Chris Schollar

Page 12: IDIA Pipelines - University of Cape Town

Data Quality Assurance

• Jordan Collier, in close collaboration with MeerKAT SDP.

• Framework to measure quality of pipelines.

• Science context (LSP-based) to be included in standard SDP Cal Report.

• SDP Cal Pipeline adjusted for science output.

• Mapping science requirements from LSP to technical requirements for pipelines.

Page 13: IDIA Pipelines - University of Cape Town

Data Quality Assurance

Page 14: IDIA Pipelines - University of Cape Town

IDIA Pipelines• processMeerKAT Pipeline.

• Package for processing on HPC (SLURM + ILIFU Cluster).

• To be generalised for use on PBS/Torque controlled system.

• Robust, generic, fast implementation of a’priori calibration (including flagging).

• General purpose Selfcal.

• Aim: T(cal) ~ T(obs)

• Framework.

• Best practices on how to use SLURM and MPICASA.

• Developer’s Guide.

• How to write and include your own modules in the pipeline.

Page 15: IDIA Pipelines - University of Cape Town

IDIA Pipelines• Algorithms written using CASA.

• MOU with NRAO.

• Most radio astronomers are familiar with CASA and MSs.

• Heterogeneous application: Single Node/Single Thread, Single-Node/Multi-Threaded (OMP), Multi Node/Multi-Threaded (OMP+MPI).

• Many pipelines use CASA for flagging and a’priori calibration (gain/bandpass) done.

• Management:

• Run MPICASA using SLURM (srun), which in turn runs the appropriate container image.

• Sidesteps SQL thread unsafe quirks.

• Keeps MPI and software quarantined (as recommended by LBL).

• Just Python and Bash (SBATCH).

Page 16: IDIA Pipelines - University of Cape Town

mpicasa -hostfile hostnames /path/to/casa --someoptions commands

mpirun

localhost slots=2010.0.0.1 slots=3010.0.0.2 slots=40

Executable

--nologger --log2term --nogui

myscript.py

some_task(arg1=‘blah’,arg2=123,arg4=‘whatever)

orted * 20

orted * 30

Localhost

10.0.0.1

orted * 40 10.0.0.2

{

LBL

CASA

Page 17: IDIA Pipelines - University of Cape Town
Page 18: IDIA Pipelines - University of Cape Town

Some Results55-dish/856MHz

Almost Noise Limited

Page 19: IDIA Pipelines - University of Cape Town

Status• COSMOS 55-dish, ~8-hr, 150MHz

• Tcrosscal = 0.5 Tobs, Tcrosscal + Timage ~ Tobs (Not Optimised!!)

• Currently matching SLURM and MPICASA parallelism.

• Not all tasks are parallelised the same.

• Partition (IO), TCLEAN (CPU), Flagdata (RAM).

• Bandpass and Gaincal (Not parallelised).

• Given an input MS and operations, decide on robust job parameters.

• Kicking off low-freq selfcal with AP from high frequency (works better than expected).

• Selfcal Recipes.

• 2 in dev (iterative masking).

• Continuum subtraction: efficacy and performance.

• UVLIN vs UVMODEL vs UVLIN+UVMODEL.

Page 20: IDIA Pipelines - University of Cape Town

Moving Forward

• processMeerKAT currently under performance testing.

• Public IDIA release soon.

• A’priori by the end of 2018?

• SLURM/MPICASA User Guide: released soon thereafter.

• Developers Guide — early 2019.

• Selfcal: Feb 2019 (Planned:)