eas data flow lessons learnt
TRANSCRIPT
Euclid Archive System from IAL perspective - Lessons learnt
Input for splinter on EAS Data Archive
February 5-7 2014, Munich
Martin Melchior and Marco Soldati
Terminology (1)
• DRMS (Distributed Resource Management System): scheduler for cluster, grid, cloud
• Submission Host: Host through which users get access to the the scheduler (DRMS)
• Execution Hosts: Computing nodes, the Submission host MAY be an Execution Host
• Job: one task sent to the DRMS
Terminology (2)
• IAL: Infrastructure abstraction layer
• TaskScheduler API: Current DRM API (by IAL)
• File Access Protocols: File, FTP, HTTP, SFTP
SDC-FR
LegendDataflow in IAL Mock
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDC-CH Computing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL
http://euclid-archive.fr/level0/raw_20140207.fits
SDC-FR
LegendDataflow in IAL Mock
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDC-CH Computing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL http://euclid-archive.ch/level0/raw_20140207.fits
http://euclid-archive.fr/level0/raw_20140207.fits
SDC-FR
LegendDataflow in IAL Mock
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDC-CH Computing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL
sftp://data/sub_workspace/level0/raw_20140207.fits
http://euclid-archive.ch/level0/raw_20140207.fits
http://euclid-archive.fr/level0/raw_20140207.fits
file://data/sdc-eas/level0/raw_20140207.fits
SDC-FR
LegendDataflow in IAL Mock
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDC-CH Computing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL
file://data/sub_host/level0/raw_20140207.fitssftp://data/sub_workspace/level0/raw_20140207.fits
http://euclid-archive.ch/level0/raw_20140207.fits
http://euclid-archive.fr/level0/raw_20140207.fits
file://data/sdc-eas/level0/raw_20140207.fits
file://mnt/exec_workspace/level0/raw_20140207.
SDC-FR
LegendDataflow in IAL Mock
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDC-CH Computing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL
file://data/sub_host/level0/raw_20140207.fitssftp://data/sub_workspace/level0/raw_20140207.fits
http://euclid-archive.ch/level0/raw_20140207.fits
http://euclid-archive.fr/level0/raw_20140207.fits
file://data/sdc-eas/level0/raw_20140207.fits
file://mnt/exec_workspace/level0/raw_20140207.
file://exec_workspace/level0/raw_20140207.
Lessons Learnt
1a.Pretty error prone to have the correct URL at the right time
1b.URLs need to be changed in all XML data objects
Abstraction of file handling is required!
2. Creating three copies of a file is too much!
Reduce!
SDC-xx
1. File Handling Abstraction
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDC-EAS
Execution Host
Submission Host
Storage
Execution Host
Storage
Pipeline TaskIAL/COORS/…
Task Scheduler
Task Executor
Submission Host
IAL
Euclid File Access Service (EuFAS™)
Requirements on EuFAS:• Lookup and retrieve files by properties (i.e unique ID)• Replicate data on request and/or based on rules• Add (and remove) files• Register physical file locations in EMA• Provide file handling framework/library for “Pipeline Tasks”
SDC-xx
Legend2. Reduce number of copies
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDCComputing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL
SDC-xx
Legend2. Reduce number of copies
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDCComputing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL
SDC-xx
Legend2. Reduce number of copies
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDCComputing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL
SDC-xx
Legend2. Reduce number of copies
Euclid Meta-data Archive System
SDC-xxEAS
EMA
SDCComputing Infrastructure
SDC-EAS
Execution Host
File storage (FTP, HTTP, File, …)
Database (RDB, XML-DB, OODB, …)
Submission Host
Storage
Execution Host
Storage
Pipeline Task
IAL
Task Scheduler
Task Executor
Software-Component
Pipeline Task
Submission Host
Science Community
IAL