eas data flow lessons learnt

14
Euclid Archive System from IAL perspective - Lessons learnt Input for splinter on EAS Data Archive February 5-7 2014, Munich Martin Melchior and Marco Soldati

Upload: euc-dm-test

Post on 13-Jul-2015

71 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: EAS Data Flow lessons learnt

Euclid Archive System from IAL perspective - Lessons learnt

Input for splinter on EAS Data Archive

February 5-7 2014, Munich

Martin Melchior and Marco Soldati

Page 2: EAS Data Flow lessons learnt

Terminology (1)

• DRMS (Distributed Resource Management System): scheduler for cluster, grid, cloud

• Submission Host: Host through which users get access to the the scheduler (DRMS)

• Execution Hosts: Computing nodes, the Submission host MAY be an Execution Host

• Job: one task sent to the DRMS

Page 3: EAS Data Flow lessons learnt

Terminology (2)

• IAL: Infrastructure abstraction layer

• TaskScheduler API: Current DRM API (by IAL)

• File Access Protocols: File, FTP, HTTP, SFTP

Page 4: EAS Data Flow lessons learnt

SDC-FR

LegendDataflow in IAL Mock

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDC-CH Computing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL

http://euclid-archive.fr/level0/raw_20140207.fits

Page 5: EAS Data Flow lessons learnt

SDC-FR

LegendDataflow in IAL Mock

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDC-CH Computing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL http://euclid-archive.ch/level0/raw_20140207.fits

http://euclid-archive.fr/level0/raw_20140207.fits

Page 6: EAS Data Flow lessons learnt

SDC-FR

LegendDataflow in IAL Mock

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDC-CH Computing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL

sftp://data/sub_workspace/level0/raw_20140207.fits

http://euclid-archive.ch/level0/raw_20140207.fits

http://euclid-archive.fr/level0/raw_20140207.fits

file://data/sdc-eas/level0/raw_20140207.fits

Page 7: EAS Data Flow lessons learnt

SDC-FR

LegendDataflow in IAL Mock

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDC-CH Computing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL

file://data/sub_host/level0/raw_20140207.fitssftp://data/sub_workspace/level0/raw_20140207.fits

http://euclid-archive.ch/level0/raw_20140207.fits

http://euclid-archive.fr/level0/raw_20140207.fits

file://data/sdc-eas/level0/raw_20140207.fits

file://mnt/exec_workspace/level0/raw_20140207.

Page 8: EAS Data Flow lessons learnt

SDC-FR

LegendDataflow in IAL Mock

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDC-CH Computing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL

file://data/sub_host/level0/raw_20140207.fitssftp://data/sub_workspace/level0/raw_20140207.fits

http://euclid-archive.ch/level0/raw_20140207.fits

http://euclid-archive.fr/level0/raw_20140207.fits

file://data/sdc-eas/level0/raw_20140207.fits

file://mnt/exec_workspace/level0/raw_20140207.

file://exec_workspace/level0/raw_20140207.

Page 9: EAS Data Flow lessons learnt

Lessons Learnt

1a.Pretty error prone to have the correct URL at the right time

1b.URLs need to be changed in all XML data objects

Abstraction of file handling is required!

2. Creating three copies of a file is too much!

Reduce!

Page 10: EAS Data Flow lessons learnt

SDC-xx

1. File Handling Abstraction

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDC-EAS

Execution Host

Submission Host

Storage

Execution Host

Storage

Pipeline TaskIAL/COORS/…

Task Scheduler

Task Executor

Submission Host

IAL

Euclid File Access Service (EuFAS™)

Requirements on EuFAS:• Lookup and retrieve files by properties (i.e unique ID)• Replicate data on request and/or based on rules• Add (and remove) files• Register physical file locations in EMA• Provide file handling framework/library for “Pipeline Tasks”

Page 11: EAS Data Flow lessons learnt

SDC-xx

Legend2. Reduce number of copies

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDCComputing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL

Page 12: EAS Data Flow lessons learnt

SDC-xx

Legend2. Reduce number of copies

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDCComputing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL

Page 13: EAS Data Flow lessons learnt

SDC-xx

Legend2. Reduce number of copies

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDCComputing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL

Page 14: EAS Data Flow lessons learnt

SDC-xx

Legend2. Reduce number of copies

Euclid Meta-data Archive System

SDC-xxEAS

EMA

SDCComputing Infrastructure

SDC-EAS

Execution Host

File storage (FTP, HTTP, File, …)

Database (RDB, XML-DB, OODB, …)

Submission Host

Storage

Execution Host

Storage

Pipeline Task

IAL

Task Scheduler

Task Executor

Software-Component

Pipeline Task

Submission Host

Science Community

IAL