LSST: Preparing for the Data Avalanche through LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and ProvenancePartitioning, Parallelization, and ProvenanceKirk Borne (Perot Systems Corporation / NASA GSFC and George Mason University / LSST)
The LSST research and development effort is funded in part by the National Science Foundation under Scientific Program Order No. 9 (AST-0551161) through Cooperative Agreement AST-0132798. The LSST research and development effort is funded in part by the National Science Foundation under Scientific Program Order No. 9 (AST-0551161) through Cooperative Agreement AST-0132798. Additional funding comes from private donations, in-kind support at Department of Energy laboratories and other LSSTC Institutional Members.Additional funding comes from private donations, in-kind support at Department of Energy laboratories and other LSSTC Institutional Members.
National Optical Astronomy ObservatoryNational Optical Astronomy Observatory
Research CorporationResearch Corporation
The University of ArizonaThe University of Arizona
University of WashingtonUniversity of Washington
Brookhaven National LaboratoryBrookhaven National Laboratory
Harvard-Smithsonian Center for AstrophysicsHarvard-Smithsonian Center for Astrophysics
Johns Hopkins UniversityJohns Hopkins University
Las Cumbres Observatory, Inc.Las Cumbres Observatory, Inc.
Lawrence Livermore National LaboratoryLawrence Livermore National Laboratory
Stanford Linear Accelerator CenterStanford Linear Accelerator Center
Stanford UniversityStanford University
The Pennsylvania State UniversityThe Pennsylvania State University
University of California, DavisUniversity of California, Davis
University of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-Champaign
ABSTRACT: The Large Synoptic Survey Telescope (LSST) project will produce 30 terabytes of data daily for 10 years, resulting in a 65-petabyte final image data archive and a 70-petabyte final catalog (metadata) database. This large telescope will begin science operations in 2014 at Cerro Pachon in Chile. It will operate with the largest camera in use in astronomical research: 3 gigapixels, covering 10 square degrees, roughly 3000 times the coverage of one Hubble Space Telescope image. One co-located pair of 6-gigabyte sky images is acquired, processed, and ingested every 40 seconds. Within 60 seconds, notification alerts for all objects that are dynamically changing (in time or location) are sent to astronomers around the world. We expect roughly 100,000 such events every night. Each spot on the available sky will be re-imaged in pairs approximately every 3 days, resulting in about 2000 images per sky location after 10 years of operations (2014-2024). The processing, ingest, storage, replication, query, access, archival, and retrieval functions for this dynamic data avalanche are currently being designed and developed by the LSST Data Management (DM) team, under contract from the NSF. Key challenges to success include: the processing of this enormous data volume, real-time database updates and alert generation, the dynamic nature of every entry in the object database, the complexity of the processing and schema, the requirements for high availability and fast access, spatial-plus-temporal indexing of all database entries, and the creation and maintenance of multiple versions and data releases. To address these challenges, the LSST DM team is implementing solutions that include database partitioning, parallelization, and provenance (generation and tracking). The prototype LSST database schema currently has over 100 tables, including catalogs for sources, objects, moving objects, image metadata, calibration and configuration metadata, and provenance. Techniques for managing this database must satisfy intensive scaling and performance requirements. These techniques include data and index partitioning, query partitioning, parallel ingest, replication of hot-data, horizontal scaling, and automated fail-over. In the area of provenance, the LSST database will capture all information that is needed to reproduce any result ever published. Provenance-related data include: telescope/camera instrument configuration; software configuration (software versions, policies used); and hardware setup (configuration of nodes used to run LSST software). Provenance is very dynamic, in the sense that the metadata to be captured change frequently. The schema has to be flexible to allow that. In our current design, over 30% of the schema is dedicated to provenance. Our philosophy is this: (1) minimize the impact of reconfiguration by avoiding tight coupling between data and provenance: hardware and software configurations are correlated with science data via a single ProcessingHistory_id; and (2) minimize the volume of provenance information by grouping together objects with identical processing history.
LSST = Large
Synoptic Survey
Telescope
8.4-meter diameterprimary mirror =10 square degrees!
(design, development, construction, and operations of telescope & observatory funded by NSF)
(mirror funded by private donors)
Hello !
LSST Camera = 201 CCDs @ 4096x4096 pixels each!= 3 gigapixels = 6 GB per image, covering 10 sq.degrees= ~3000 times the area of one Hubble Telescope image
(camera funded by DOE)
Focal Plane Arrayscale model
Observing Strategy: One pair of images every 40 seconds for each spot on the sky,then continue across the sky continuously every night for 10 yea rs (2014-2024), with time domain sampling in log(time) intervals (to capture dynamic range of transients).
Data Products• Image Archive (65 Petabytes! after 10 years)
– 2000 visits for each 10 sq.degree patch of sky– 2000 patches in the viewable sky– 30 Terabytes per night (~2000 images)
• Object, Moving Object, & Source catalogs• Alert Notifications:
– VOevent message protocol– 100,000 alerts per night– Anything that has changed (moving or optically variable)
• Full project database (70 Petabytes!)• Uniformly processed data releases (annual)• Deep co-added image of the sky:
– Individual images reach 24 th magnitude– Deep stacked image reaches 27 th magnitude
Database Contents
• >100 database tables:– Source catalog– Object catalog– Moving Object catalog– Variable Object catalog– Alerts catalog– Calibration metadata– Configuration metadata– Processing metadata– Provenance metadata– etc.
Source– 260 billion rows *– 2,000 partitions *– 306 bytes/row– 1 row=data for 1 filter
Object
– 22 billion rows *– 2,000 partitions *– 1.8 KB/row– 1 row=data for 6 filters
Image Metadata– 675 million rows *– 1 row = metadata for 1 ccd -amplifier
* - as of Data Release 1 (DR1), 2014
Data ProductsSource Catalog, Object Catalog, Images, Alerts
PipelinesImage Processing, Detection, Association, Moving Object, Classification, Calibration
Application FrameworkImage, Astronomical Object, Catalog, Collection, Table, Meta -Data,
Component, Processing Stage, Processing Slice
Application Layer• Scientific Layer• Object Oriented, C++
Custom Software
Data Acces sData Access Framework,Dis tributed File Sys tem,
Database Management Sys tem
Dis tributed Proces s ingPipeline Cons truction
Pipeline ExecutionManagement and Control
Sys tem Adminis tration, Operations , SecuritySys tem Resource Management, User Management, Certificate -based Security
Middleware Layer• Portability, Standard Services• Open Source, Off-the -shelf
Software• Custom Integration
Us er InterfaceData QualityVisualization
VO Interfaces
ComputingClus ters /Servers , Operating Sys tem
CommunicationsFiber, Switches , Routers , Firewalls ,
Communications Stacks ,Network Management Software
Phys ical PlantPower, Cooling, Space
Infrastructure Layer• Distributed Platform• Off-the -shelf, Commercial
Hardware/Software
StorageDisk, Tape, Controllers ,
Storage Management Software
Data Management Sys tem Layered Architec ture :for sca lability, re liability, and evolution
Database System Design Meeting Massive Data Management Challenges:Parallelization, Partitioning, and
Providing Virtual Data through Provenance• Pipeline processing:
– CPU and Data Parallelization• Database access:
– Sky Partitioning –• Spatial clustering of Source and Object Catalogs• Temporal partitioning of DIASource catalog
• Database volume:– Provenance (>30% of current DB Schema; but <1% of DB volume)– Without provenance unique identifiers (see below), the DB volume for tracking
provenance (@1 -second time resolution for 10 years) would be at least 10x large r.• Why Provenance?
– to keep data volume "manageable" – maintaining provenance allows us to discard intermediate data products and rebuild them at will on -the -fly: VIRTUAL DATAVIRTUAL DATA !
• Implementing Provenance:– Unique Processing_ID used throughout DB. This 4 -byte Processing_ID uniquely
identifies the full hardware & software configuration (telescope , instrument, processing pipeline version, processing parameters, etc.) for each source m easurement (roughly 100 million sources per image pair, with 1000 image pairs per ni ght, every night).
– Minimize the impact of reconfiguration by avoiding tight couplin g between data and provenance: hardware and software configurations are correlated with data via a single Processing_ID , which is included in all major DB tables (Object, Source,...).
– Minimize the volume of provenance information by grouping togeth er objects with identical processing history.
Sample DB Table tracking provenance history
• The validity of a given set of parameters (hardware, software, p rocessing, telescope, or instrument) for any database entry (source, object , etc.) is determined by the “ validityvalidity ” time for that unique provenance identifier.
• Sample Table : Provenance of Pipeline Run Configuration
class Image Metadata
FPA_Exposure
Amp_Exposure
Amp_WCS Amp_PSFAmp_PhotoCal
_FPA_Exposure2Visit
Visit
prv _Filter
Calibration_FPA_ExposureScience_FPA_Exposure
CCD_Exposure
PSF
TemplateImage
_FPA_Exposure2TemplateImage
CCD_PhotoCal CCD_WCS
FPA_PhotoCal FPA_WCS
CCD_PSF
FPA_PSF
prv _Amplifier
prv _cnf_MaskAmpImage
_FPA_PSF_Row
0..1
subtractedExposureId
1
1..*
amplifierId
1
0..1
exposureId
1
0..1
ccdExposureId
1
0..1
exposureId
1
0..1
exposureId
1
0..1
ccdExposureId
1
0..1
ccdExposureId
1
1..*
templateImageId
1
1
exposureId
1
6
psfId
1
0..1
varianceExposureId
1
1..*
fi lterId1
0..1
exposureId
1
0..1
exposureId
1
1..*
visitId
1
1exposureId
1
1..*
psfId
1
0..1
ampExposureId
1
0..1
ampExposureId
1
1..*
amplifierId
1
30
ccdExposureId
1
1
ampExposureId
0..1
201
exposureId
1
Name:Package:Version:Author:
Image MetadataMain Telescope2.6.1Jacek Becla
class Calibration
Science_FPA_Exposure
Flat_FPA_Exposure
_FPA_Flat2CMExposure
FPA_Exposure
Bias_FPA_Exposure Dark_FPA_Exposure
_FPA_Bias2CMExposure _FPA_Dark2CMExposure
Flat_FPA_CMExposure Fringe_FPA_CMExposureBias_FPA_CMExposure Dark_FPA_CMExposure
_Science_FPA_Exposure_Group
_FPA_Fringe2CMExposure
1..*
biasExposureId
1
0..*
darkExposureId
1
0..*
biasExposureId
1
0..*
flatExposureId
1
1
cmDarkExposureId
1
1
cmBiasExposureId
1
1
x_cmFringeExposureId
6
1
x_cmFlatExposureId
6
1
cmDarkExposureId
1..*
1..*
darkExposureId
1
1
cdFringeExposureId
1..*
1..*
cmBiasExposureId
1
0..1
exposureId
1
0..1
exposureId
1
0..1
exposureId
1
1
cmFlatExposureId
1..*
0..*
darkExposureId
1
0..*
biasExposureId
1
1..*
flatExposureId1
0..1
exposureId
1
1
cseGroupId
1
0..1
varianceExposureId
1
0..1
subtractedExposureId
1
0..*
biasExposureId
1
Name:Package:Version:Author:
CalibrationMain Telescope2.6.1Jacek Becla
class Hardware Prov enance
prv _cnf_Amplifier
prv _cnf_Raft
prv _cnf_FocalPlane
prv _cnf_CCD
prv _Amplifier
prv _CCD
prv _Raft
prv _FocalPlane
prv _Filterprv _cnf_Filter
prv _Telescopeprv _cnf_Telescope
Amp_Exposure
prv _cnf_MaskAmpImage
1..*
amplifierId
1
1..*
raftId
1
1..*
focalPlaneId
1
1..*
ccdId
1
30
ccdId
1
9
raftId
1
23
focalPlaneId1
6focalPlaneId
1
1..*
fi lterId
1
1
focalPlaneId
1
1..*
telescopeId
1
1..*
amplifierId
1
1..*
amplifierId
1
Name:Package:Version:Author:
Hardware ProvenanceProvenance2.6.1Jacek Becla
class Software Prov enance
prv _Stage
prv _Run
prv _cnf_Policy
prv _Policy
prv _Node
prv _Pipeline
prv _Slice
prv _cnf_Pipeline2Run
prv _Pipeline2Run
prv _Stage2Pipeline
prv _cnf_Stage2Pipeline
prv _Stage2Slice
prv _cnf_Stage2Slice
prv _cnf_Node
prv _cnf_Slice
prv _UpdatableColumn
prv _UpdatableTable
prv _Stage2UpdatableColumn
prv _cnf_Stage2UpdatableColumn
1..*
pipelineId
1
1
columnId
1
1..*
stageId
1
1..*
tableId
1
1..*
sliceId
1
1..*
nodeId
1
1..*
nodeId
1
1..*
stage2SliceId
1
1
sliceId 1
1
c_stage2UpdatableColumnId
1..*
1..*
stage2pipelineId
1
1..*
policyId
1
1
stageId 1
1pipelineId 1
1..*
runId
1
1..*
pipeline2runId
1
1..*
policyId
1
1..*
policyId
1
1..*
policyId
1
1..*
policyId
1
1..*
stageId
1
Name:Package:Version:Author:
Software ProvenanceProvenance2.6.1Jacek Becla
class Prov enance
prv _Stage
prv _ProcHistory
Object Source DIASourceMov ingObject Amp_ExposureCCD_ExposureFPA_Exposure
prv _Snapshot
Objects/sources/exposures processed using the same stages can share one instance of ProcessingHistory. Each instance of ProcessingHistory keeps track of stages that were run as part of given ProcessingHistory. It also keeps track of time window during which each stage run. This time window can then be used to locate appropriate configuration of every configurable piece of hardware and software (prv_cnf_* tables). Note that multiple configurations of the same thing may exist for a given stage. This isallowed for cases, where reconfiguration is believed to be such that it does not affect data (for example slice can fail and be restarted on a different node).
prv _Stage2ProcHistory
1
stageId
1
0..*
procHistoryId
1
1..*
procHistoryId
1
1..*
procHistoryId
1
1..*
procHistoryId
1
1..*
procHistoryId
1
1..*
procHistoryId
1
1..*
procHistoryId
1
1..*
procHistoryId
1
1..*
procHistoryId
1
Name:Package:Version:Author:
ProvenanceProvenance2.6.1Jacek Becla
Sample database schema shown here:Sample database schema shown here:- ImageImage- CalibrationCalibration- ProvenanceProvenanceEach of these illustrates the vast collection of system parameters that are used to track the state of the end-to-end data system: what was state of the telescope & camera during an image exposure, or what were the values of the numerous calibration parameters, or what was the state & version of all relevant hardware and software components in the pipeline during data processing and ingest.Solution:Solution: Track all of these system parameters (provenance) with a single (unique) Processing_IDProcessing_ID.
Overview: LSST Sky Survey and Data System
Data Management System Architecture:Data Management System Architecture:• Multiple, specialized processing sites –
• Mountain/Base for real -time data reduction• Archive Center for non -real-time data reduction,
archival data storage, data re lease• Data Access Centers for external data access
Common Pipeline Components
Data Acquisition
Infrastructure
Image
ProcessingPipeline
Detection Pipeline
Association Pipeline
ImageArchive
SourceCatalog
ObjectCatalog
Alert
Archive
Deep Detect Pipeline
Deep
ObjectCatalog
VO Compliant Interface Middleware
Classification
Pipeline
Moving Object Pipeline
CalibrationPipeline
User
Tools(Query,
Data Quality
Analysis, Monitoring)
AlertProcessing
Eng/Fac DataArchive Calibration
Data
OrbitCatalog