surace 2014 jason surace (data systems lead) zwicky transient facility data system

21
Surace 2014 Jason Surace (Data Systems Lead) Zwicky Transient Facility Data System

Upload: ferdinand-reed

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Surace 2014

Jason Surace(Data Systems Lead)

Zwicky Transient FacilityData System

Where is this Happening?Infrared Processing and Analysis Center

Surace 2014

IPAC is a multimission science center on the Caltech campus originally founded for the IRAS mission. Primarily funded by NASA, IPAC handles data processing, archiving, outreach, and/or command and control functions for IRAS, ISO, Spitzer, GALEX, Herschel, Planck, and WISE, as well as 2MASS, KI, and PTI. Also hosts NED, NStED and IRSA. Located in two buildings with extensive modern data rooms in each.

IPAC does more than infrared! Recent ground-based project involvement includes LCO-GT and LSST.

Keith-Spalding Spitzer Science Center Morrisroe Astroscience Laboratory

PTF Data System Development

Surace 2014

IPAC began pipeline and archive development for the Palomar Transient Factory in 2008. Small core team with significant contributions from partners, students, and post-docs.

PTF has been a continuous learning process, with evolution of the system to meet both a changing science program and evolving understanding of the science requirements. High agility level required.

Very large data volumes and limited resources. Significant leverage of existing IPAC expertise and in-house facility resources (e.g. IRSA, ICE, etc).

Current PTF Data Holdings

Surace 2014

R-band: 1247 nights, 3 million images.

Physical Infrastructure

Surace 2014

IPAC Morrisroe Computer Center

• 24 drones with 240 cores.• ~0.5 PB of data on spinning enterprise disks.• Tape backup deep storage system.• 86 TB database server.• A lot of “junk”: 10G network switches, development computers, archive servers, fiber cards, racks, etc.

System will be scaled up by a factor of 10 for ZTF.

PTF Processing Mechanics

Surace 2014

• Software is a mixture of new, community, and IPAC heritage code. Wrappers knit it together, database used to track everything.

• Highly parallelized. Quantization is ccd (detector area) based. 0.65 square degrees at a time.

• Core team at IPAC, drawing on other IPAC expertise for specific tasks as needed.

• Graduate students and post-docs within collaboration provided significant analysis role. Ties to science community extraordinarily important.

Data System Segments

• Realtime Data Processing – image subtraction, transient and solar system object detection.

• High Fidelity Daily Processing – nightly processing and recalibration for highest data quality images and source catalogs.

• Ensemble Processing – periodic construction of coadded images, processing of catalogs to create high precision light curves.

• Long-term Data Curation - storage of all raw data, processed data (images and extracted photometry), and an advanced data archive with data exploration tools, with public release.

Surace 2014

Realtime Pipeline

• The realtime pipeline triggers as data flows from the mountain as soon as it is taken.

• This is a modified version of the daily high fidelity pipeline, using fixed calibration.

• During the PTF era, this was led from LBL drawing on their SNe search expertise.

• During iPTF, IPAC developed a similar capability with new software development, with a focus on SSOs.

Surace 2014

Realtime Pipeline

• Pipeline contains an image subtraction against a reference image library (described later).

• Transient source detection.• Streak detection.• Machine vetting via a 3rd gen algorithm developed by

JPL/LANL.• Roughly 10-minute phase lag on current system.

Surace 2014

Nightly Data Processing• Data flows in realtime from 48-inch to IPAC via Cahill.

System kicks off after all data arrives.• Data is archived to tape. • Flats, biases, and other cal files assembled from ensembles of

data taken throughout night.• Astrometric and photometric calibration on a per-frame, per-

chip basis. 2-3% photometry, 0.15” astrometry.• Source extraction of all detected sources.• Deposition into IRSA Archive system.• Completion by next afternoon, available in archive in 1-3 days.

Surace 2014

Surace 2014

• Required inputs for image differencing. Also forms the backbone of the relative photometry pipeline via deep reference catalogs.

• Pipeline system designed to trigger when analysis of incoming data indicates enough new data has accumulated to increment existing deep coadds by half a magnitude.

• Images internally aligned to each other, reprojected and recombined with sophisticated outlier rejection.

• Many limits and checks on input data.

Deep Sky Coadds aka “Reference Images”

Deep Sky Coadds aka “Reference Images”

Surace 2014

Single Image 60 sec @R Field 5257, Chip 7, Stack of 34

Deep Sky Coadds aka “Reference Images”

Surace 2014

* Results not typical. Near Galactic Center.

Relative Photometry aka “Lightcurve Pipeline”

• Source association through positions across epochal apparitions (in catalog space), using reference catalogs derived from deep reference images.

• Additional processing that computes image-wide delta corrections to regular pipeline photometry, using all apparitions of sources on that chip/field. Layers on top of existing catalog data.

• Achieves few-millimag performance.

• Has been running in experimental form for several years. Now incorporated into the regular online processing.

• Significant computer science complexities in how to handle datastream in finite time for daily updates.

Surace 2014

Public Web Pages

Surace 2014

As part of the public data release, we have commissioned a new web portal for PTF, which includes the path to the data archive as well as project documentation. This was designed deliberately to be readily extensible to ZTF.

http://ptf.caltech.edu

Public Data Release

Surace 2014

Released ~190k images and catalogs, or 10% of the existing PTF-era data, from 6 separate regions on the sky.

Public Archive

Surace 2014

Public Archive

Surace 2014

Both an interactive GUI-based archive for image and catalog file discovery, as well as VO-compliant software APIs, currently in use by several science programs.

Lead-In to ZTF

Surace 2014

• PTF and iPTF are the direct pathfinders for ZTF.• Data system is now mature. A few remaining

segments are being completed now and will be in place at least a year prior to ZTF.

• Most significant is the archive system interface for the catalogs and light curves, which will be implemented for the year 2 data release.

Major ZTF Tasks

Surace 2014

• Adaptation to substantially greater data rates. Parallelization will still be spatial. Data system design allows replication in PTF-like subunits.

• Adaptation to any new detector peculiarities.• Retuning of the realtime transient pipeline,

specifically reworking the machine vetting process for transient candidates.

• Development of the VO alert subscription service.

Surace 2013