iris-hep analysis systems team · the user- facing semantics of physics analysis. •leverage and...
TRANSCRIPT
![Page 1: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/1.jpg)
IRIS-HEPAnalysis Systems Team
Kyle Cranmer
![Page 2: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/2.jpg)
Overall R&D goal for Analysis Systems
• Develop sustainable analysis tools to extend the physics reach of the HL-LHC
experiments by creating greater functionality, reducing time-to-insight,
lowering the barriers for smaller teams, and streamlining analysis
preservation, reproducibility, and reuse.
2
![Page 3: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/3.jpg)
Production System Analysis Files
Scan data, explore with histograms,
making plots
Fitting, manipulation, limit
extrapolation
Archiving, publication,
Reinterpretation,etc.
Capture & Reuse
- scikit-hep- awkward array- Parsl
- pyhf- HistFactory v2- GooFit- Decay Language
- Analysis Database- Recast- CAP/INSPIRE/HEPDATA
Analysis Systems, analysis & declarative languages(underlying framework)
- Leverage & align with industry
- Training & workforce development
DOMA SSL SSL
Partner Focus Area
Analysis Systems Scope
![Page 4: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/4.jpg)
Context
• Compared to DOMA and IA (which has more targeted reco/trigger goals), the Analysis Systems group is dealing with more “greenfield” area where there is a very heterogeneous set of use cases and relevant components
• Nature of AS tasks will be more exploratory and “big R”
• The AS group is bringing together a few existing groups • DASPOS and capture/reproducibility/reuse components of DIANA
• Scikit-hep and Jim’s efforts on interoperability and query-based systems
• High-performance statistical analysis tools (eg. GooFit, HistFactory, pyhf, etc.)
• And adding new connecting theme: declarative specifications
4
![Page 5: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/5.jpg)
Several aspects of Analysis Systems converge in a typical physics plot:● Specification of signal / validation / control regions● Specification of variables to be used for stat analysis● Reduction to that format running on data and MC● Management of MC samples, data driven backgrounds, etc.● Management of systematic variations● Feed reduced data (eg. histograms) into specification for
statistical model / likelihood function● Fitting & statistical tools● Publishing results & derived data products● Analysis preservation & gateways targeting reinterpretation
A point of convergence
![Page 6: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/6.jpg)
Focus areas• Establish declarative specifications for analysis tasks and workflows that will
enable the technical development of analysis systems to be decoupled from
the user- facing semantics of physics analysis.
• Leverage and align with developments from industry and the broader
scientific software community to enhance sustainability of the analysis
systems.
• Develop high-throughput, low-latency systems for analysis for HEP.
• Integrate analysis capture and reuse as first class concepts and capabilities
into the analysis systems.
6
![Page 7: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/7.jpg)
Analysis Systems Team
NYU: Kyle Cranmer, TBD postdoc, TBD application developer
UIUC: Mark Neubaeur, Dan Katz, Ben Galwesky, TBD postdoc
UW: Gordon Watts, Mason Proffit, TBD postdoc
Princeton: Jim Pivarski, Vassil Vassilev
Cincinnati: Mike Sokoloff, Tim Evans
![Page 8: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/8.jpg)
Partnerships•External
• Open Source Data Science Tools: Dask, Apache Arrow, pandas, Jupyter, …• Statistics and ML-analysis tools: pytorch, tensorflow, mxnet, pyro, ONNX, ...• Industry ML: FAIR, DeepMind, Amazon, nVidia, • SCAILFIN (NSF grant: Workflows + Machine Learning: Hildreth, Cranmer, Neubaeur)• Astro. & Cosmo (via stats. & likelihood-free inference), Genomics (via workflows)• Parsl, Common Workflow language, GitHub, etc.• Scientific Gateways Institute• CERN IT via INSPIRE, HEPData, CAP, REANA, …• HSF analysis group
• Internal• DOMA iDDS• SSL • Sustainable Core• OSG
8
![Page 9: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/9.jpg)
Backup
![Page 10: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/10.jpg)
External Collaboration: SCAILFIN
• Not developing methodology, but implementing them in scalable distributed systems
• Theme = ML methods + Workflows and distributed systems
• Emphasis on use-cases that involve simulation+ML together and are iterative in nature (not static bulk processing)
10
![Page 11: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/11.jpg)
pyhf: python implementation of HistFactory (Cranmer)
11
HistFactory v1 specification implemented in ROOT used widely in ATLAS. Similar to CMS Combine for binned models. Now implemented in pyhf.
M. Feickert Talk on pyhf at DIANA/HEP
![Page 12: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/12.jpg)
CERN IT connections
12
![Page 13: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/13.jpg)
Connections with Science Community Gateways Institute
13
• Gateways are ideal for improving the Theory/Experiment interface
• Eg. Reinterpretation and “Recasting”
#papers in hep-ph using the term "Recast"
![Page 14: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/14.jpg)
Connections with SSL
14
https://youtu.be/2PRGUOxL36M?t=15m43s
Containerd maintainer
![Page 15: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/15.jpg)
Connections with OSG
15
• Work to have interoperability between containerized end-user analysis jobs (that are natural with GitLab Continuous Integration and CAP/REANA) and GRID jobs.
• Common objective to solve image distribution at GRID scale (e.g: cvmfs containerd integration)
eg: See Blomer's ACAT Poster
![Page 16: IRIS-HEP Analysis Systems Team · the user- facing semantics of physics analysis. •Leverage and align with developments from industry and the broader scientific software community](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f44e6a5c2a41845da03f6e2/html5/thumbnails/16.jpg)
In Memory/File Layout
Structured Query
Query with Domain Knowledge
Components of the Analysis Language Hierarchy
numpy, pandas, RDataFrame, LINQ
TTree, numpy, jagged array
CutLangD
omain K
nowledge
The electron is a first class object, specific to class of experiment.
Data model contains object definitions, data structure is part of the language, experiment agnostic
Data model contains all information, field and experiment agnostic
Analysis languages translate the intent of the physicist into the code that does the work. They can be loosely arranged by how much domain knowledge they contain, from binary in memory/file formats that are very flexible to languages that are really only appropriate for a particular type of experiment (LHC collider, or perhaps a large nuclear experiment).