addressing the data challenges at the advanced light source › nitrdgroups › images › f › ff...

20
Addressing the Data Challenges at the Advanced Light Source Alexander Hexemer Senior Staff Scientist Program Lead for Computing Advanced Light Source Center for Advanced Mathematics for Energy Research Applications (CAMERA) Lawrence Berkeley National Lab ADVANCED LIGHT SOURCE

Upload: others

Post on 07-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Addressing the Data Challenges at the Advanced Light Source

    Alexander HexemerSenior Staff Scientist Program Lead for Computing Advanced Light SourceCenter for Advanced Mathematics for Energy Research Applications (CAMERA)Lawrence Berkeley National Lab

    ADVANCED LIGHT SOURCE

  • We can’t do it alone!!!

  • Acknowledgment Lawrence Berkeley National Lab

    • Ronald J Pandolfi

    • Dinesh Kumar

    • Dylan McReynolds

    • Singanallur Venkatakrishnan (now ORNL)

    • Luis Barrosso-Luque

    • Holden Parks

    • Austin Blair

    • Dilworth Parkinson

    • Shuai Liu

    • Nathan Melton

    • Andrew Wiedlea

    • Debbie Bard

    • Hari Krishnan

    • Krishna Muriki

    • Dani Ushizima

    • James A Sethian

    3

    Funding Acknowledgments

    • LBNL LDRD Program

    • DOE Early Career Award

    • CAMERA

    Argonne National Lab

    • Zhang Jiang

    • Doga Gursoy

    • Francesco De Carlo

    • Xianghui Xiao

    • Ian Foster

    • Nicholas Schwarz

    • Ryan Chard

    SLAC

    Amanda Fournier

    Fang Ren

    Yury Kolotovsky

    Apurva Mehta

    Chris Tassone

    Amedeo Perazzo

    Brookhaven National Lab

    Masafumi Fukuto

    Kevin Yaeger

    Thomas Caswell

    Stuart Campbell

    ''

    BROOKHAVEN '.'\J /\ l I O f\ ,:\ I I ,\ B O R .-\ I O R Y

    ,~ ALS ~Jf ADVANCED LIGHT SOURCE

    ~ ~10

    BERKELEY LAB SCIENCE-IT

    GitHub~ El Read the Docs

  • Center for Advanced Mathematics for Energy Research Applications

    Include latest theory and math, take advantage of latest architecture:

    Multi CPU/GPU, open source, everything shared, many collaborationswww.camera.lbl.gov

    4

    DOE: BES and ASCRRenewed last year

    Autonomous Data Collection

    Machine Learning

    Autonomous Data Collection

    C011nKled

    Ma, P«ll(l c.or.. (USI Mli,-c,ol[2 c.on.1su1 u) · 01 -- ·~ '""I u•uxw "K.wxza

    !16X'lliXM

    Machine Learning

    Nanocrystallography

    Electronic Structure

    Fluctuation Scattering

    Material Informatics

    Ptychography

    GISAXS

    Image Analysis

    Jamie Sethian Head of CAMERA

    http://www.camera.lbl.gov/

  • Challenges for User facilities

    • >20% new users

    • Provide very fast feedback and/or experiment combining data from different modalities

    • Applying custom workflow to many data sets

    • Need for new math and algorithms

    • Make things easy and faster for users

    • Data access across facilities

    • Large amounts of data

    0

    5

    10

    15

    20

    25

    30

    35

    20

    18

    20

    19

    20

    20

    20

    21

    20

    22

    20

    23

    20

    24

    20

    25

    20

    26

    20

    27

    20

    28

    PB

    /yea

    r

    Complex Raw Data (prediction ALS)

    ADVANCED LIGHT SOURCE

  • New Sources are on the Horizon ADVANCED LIGHT SOURCE

    C\J.-, 1022 ,:, ro ~

    ~ 1015 E E 1021 CD 0

    ' E 0 ,--~ 0

    ~ 1020 u Cl) 0 ~ 1014 ,--0 a.. ......... ~ 1019 4" X :::, \ en LL - J .c APS a.. - \ / C

    ' I .........

    ~ 1013 ~ 1018 \ \ Cl) ~ (l)

    \ .c C 0 -.c 0 0) \ ~ 1017

    I 1012

    soft x-rays I soft x-rays 1016

    101 102 103 104 105 101 102 103 104 105

    Photon Energy [eV] Photon Energy [eV]

  • • Development of the standards for data storage and file formats

    • Create reports and outline ways for the light sources to work together.

    • Explore the topic of real time computing at the 5 light sources and into the

    future.

    • Data access across facilities

    • Regular meetings online and in person.

    The Light Source Data Working Group The purpose of the DWG is to serve as a resource for the 5 BES light source Facility Directors to call

    upon to provide information and recommendations on working together in the areas of data and

    computing.

    ALS (Alexander Hexemer), APS (Nicholas Schwarz) LCLS II (Amedeo Perazzo), NSLS II (Stuart Campbell)

  • • NERSC (Debbie Bard et al.): HPC API specs are currently developed, GPGPUs

    • Lab science IT (Andrew Wiedlea et al.): GPGPUs, data storage, infrastructure, Mongo and many other ideas

    • ESNet (Kate Mace et al.): Network challenges

    • Data Acquisition Group (Kevan Anderson et al.): Data and Metadata access (NEXUS), development of a python/scripting interface to allow for analysis driven control of the beamline

    • CAMERA (Jamie Sethian et al.): Analysis, Xi-CAM, cam link, HPC API, sharp and many more

    Development of the Data Movement in Collaboration with other Divisions of LBNL ADVANCED LIGHT SOURCE

  • GIWAXS scattering on printed OPVs

    Design Model

    9

  • 10

    Full data lifecycle: Xi-CAM-Link [Data]

    remote

    event loopframe

    grabber

    exp.

    control

    remote

    event loop

    process

    images

    ptycho

    SHARP

    Compute clusterExperiment /

    data acquisition

    Execution steps

    1.Identify and setup resources

    2.Launch services

    3.Connect network

    4.Execute graph

    Remote Storage/Tape Archive• Compute• Publishing• Archiving

    Edge Services –DataBroker: Event Based• Metadata• Provenance• Data TrackingMongoDB• Workflow State• Data Cataloging• Data Tagging• LoggingCAM-Mart - Algorithm Marketplace

    Local StorageQC, Triage, Local Analysis (FFB)

    BROOKH \J ,.\ I I Or,,; ,:\ I AVEN , l ,\ I\ ( l [{ ,.\ I ( l Ry

  • 11

    Full data lifecycle: Xi-CAM-Link [Analysis]

    Graphical User Interface

    (GUI)

    remote

    event loopframe

    grabber

    exp.

    contro

    l

    remote

    event loop

    process

    images

    ptych

    o

    SHAR

    P

    User / local computer

    Compute clusterExperiment /

    data acquisition

    Execution steps

    master

    event loop

    remot

    e

    contro

    l

    elog

    1.Identify and setup resources

    2.Launch services

    3.Connect network

    4.Execute graph

    CAM-Link

    • Compression• Preprocessing• Analysis close to detector

    Python-based• Perform Queries,

    Searches• Programmable

    Pipelines• Visualization

    Python-based• Remote Workers• Support

    Docker/Singularity (WF Provenance)

    . /11!i1/ ' -- ,., .. l I ~ - , __ ... - , . """ '! .i ;. ~ - (~-

    ~-- .-;.,: ~i. . , : .... " ' ...... ,. - . .. . ' 0 .

    ca

  • Deployment of Mathematical Algorithms: Xi-cam

    Scientific Achievement

    Development of a community-maintainable

    platform for new analysis and visualization

    techniques for synchrotrons.

    Research Details

    –Remote processing with HPC for real-time

    high data rate analysis

    –remote data access for high-volume data

    retrieval

    –Highly interactive design

    TomographyTime resolved SAXSElectron MicroscopyNEXAFSiPythonGlobus OnlineDatabroker

    Remote ComputingGISAXS HIPGISAXS GIWAXS simulatorReverse Monte CarloCD-GISAXSPlug-in Store

    Xi-CAM plug-ins

    BNL SMI Beamline

    Result of DOE Early Career Awardcam

    APS 2-BM APS 8-1D-E BNL 11-BM

    Batch H1pGISAXS I Log I llmel1ne I Tomography View er I 30 View er

    = ~

    vuo:n 2m 00101.edf Yll031-2m -00108.edf YllOJl - 2m - 00109.edf YL1031-2m-00110.edf 'r'L1031-2m-00111.edf YL1031-2m-00112.edf Yl1031-2m-00113.edf Yll031=2m~00114.edf

    / 0 1-ccU--',lllJ.!. q-11 lllh 1\ q.,.. \J OUh A (I ----c(l 111

    ~ A+ B A · B

    Xi-cam: A versatile interface for data visualization and analysis Ronald J. Pandolfi/* Daniel B. Allan/ Elke Arenholz;1 Luis Barroso•Luque/ Stuart 1. Campbell / Thomas A. Caswell/ Austin Blair/ Francesco De Carlo,C Sean Fackler/ Amanda P. Foumier,b Guillaume Freychet/ Masa• fumi Fukuto/' Doga Giirsoy,C·h Zhang Jiang/ Harinarayan Krishnan/ Dinesh Kumar/ R. Joseph Kline,& Ruipeng Li/ Christopher Liman,s Stefano Marchesini,' Apurva Mehta," Alpha T. N' Diaye,' Dilworth (Dula) Y. Parkinson/ Holden Parks/ Lenson A. Pellouchoud,a Talita Perciano/ Fang Ren,b Shreya Sahoo/ Joseph Stnalka,C Daniel Sunday,s Christo-pher J. Tassone,;a Daniela Ushizima,;a SinganallurVenkatakrishna.n,d Kevin G. Yager/ James A. Sethian;a and Alexander Hexeme~

    H1pRMC I !Python

    Calibrant ~teri.! t,,;~.'

    A:1 ,\.I rr

  • Scientific Aims

    Efficient platform for rapid

    parallel screening and materials

    discovery.

    Data-driven feedback directly

    drives material synthesis in-situ.

    Significance and Impact

    Prototype system at

    GIWAXS/XRD beamline at

    SSRL for development.

    Material properties and

    characterizations computed on-

    the-fly; ~1 second per sample.

    On-the-fly Data Assessment for High-Throughput XRD

    http://pubs.acs.org/doi/pdf/10.1021/acscombsci.7b00015

    Collaboration with C. Tassone (SSRL) and M. Apurva (SSRL)

  • 14

    Remote Execution

    GISAXS Workshop in Bayreuth in collaboration with CAMERA

    ~ UNIVERSITAT l[Lj BAYREUTH GISAS

    Summer School

    2018

    I 1 I ~IIL ~~~ , ,,, I

    ,,

    ,,

    .l ••••

    File Plug1ns Test:n9 Active Sess ion (loca lhost)

    Batch I H1pGISAXS I !Python I L□Q I limel1ne I Tom□Qraphy I Viewer I 30 Viewer

    Welcome ta Xi-cam

    © 0 / • i5 bin • ii boot • ii dev • ii etc • ii home • iim lib

    ~ A+ B A · B A / 8 AVG

  • The High-throughput NEXAFS workflowcollaboration with Materials Project

    15

    bl 6.3.1

    Materials Project website

    Quantities

    Processing/reduction

    AnalysisResults

    Public

    Xi-CAM

    soon

    MPCantribs Framework

    ! ::I

    ,.

    V

    Co

    i I cam

    T Harooss111g tho powlll' of suporcompuung ond stnto-of tho ort Dklctton

    he suue1,mm01hod,th0Matcoa•P,01ectp,0"""-·•·ba""' < M t • I to computod information on known oNt p,edictcd m.itonnts as wotl ::cess

    a er1a S powerfulanat-,sistoo1Sto1nsl)lreanddesioo-., Project ..-;;;;:=, ·~,.,m"'""' ~ io .-. ... nu

    JO~OU IIIAIUIAU -----·--·--

    0 ~~ou•u1HIU

    ,.. __ ----•Gil __ .. ---

    .. ~ '115!,lt.l.lU

    •••t,un

  • Deep learning for X-ray Scattering

    Ushizima, Kumar, Venkatakrishnan, Hexemer. “At the convergence of GISAXS and deep learning”, IX CANSAS Conference, Berkeley and Menlo Park, CA, Jun 2017.Ushizima, “High Throughput Reverse Image Search: Quantification, Search, Retrieval and Ranking for Multi-modal Imaging”, BNL NSLS User Meeting, NY, May 2017.

    Queries Best matches

    Input : 2D image of cristal diffraction (cubic, hexagonal … packing)

    Example 1: Classify nanostructures by packing

    Generate a model which predict with 80% accuracy the different space groups

    Example 2: Classify between SAXS/GISAXS data

    Example 3: Using Fourier transform to improve learning

    Reduce image size for classification in Fourier space to limit the lost of information

    1024 * 1024128 * 12816* 16

    Truncate frequencies in the Fourrier space to retain more information

    Automatize the data classification on beamlines (7.3.3 beamline)

    Generate a model which predict with 98% accuracy the different space groups

    Collaboration K.Yager and M. Fukuto (NSLS2) Rippel et al. (2015)

    Method Original Data Set

    Noisy Data

    Convolutional Neural Networks

    92.3% 91.6%

    Histogram of Oriented Gradients

    92% 80%

    50

    100

    150

    50 100 150

    Max pooling

    Spectral pooling

  • Use Case in Development: X-ray Photon Correlation Spectroscopy (XPCS)

    M. Sutton, C. R. Physique, 9, 657 (2008).

    G. Grübel and F. Zontone, J. Alloys Compd. 362, 3 (2004).

    The g2 curve has time information at the specific q-value (length scale).

    t

    Time delay

    Corr

    ela

    tion

    Speckle motion at length scale dn of sample.

    New Beamline development

    Interface: Xi-CAM

    HPC Code for XPCS

    Control: Bluesky and DataBroker

    . .. -3 . .. . . . .. . .. .. .. ' .t +:

    .... . . . . . . . ... . . . . ,, . . .. •I . . . ... o •• . . .. .. ., . . . .. . . . . . . .. . .

    1.15 r--~--g-,-:-(q~,,--;-)- =-·. 1

    i. " I' 1M

    (l(q,t )l(q,t H ))

    (I(q,1) )2 ... 10' 10' frame delay

    10'

    ' I I . \; I ~ - t I . r • . \ , .

    ( \ . \ , •· -I' ; ' ~ , ·, -r ~"'

    .• • \.> . .Ir

    ADVANCED LIGHT SOURCE

    BROOKHAVEN . 'J\1 101\'\ I I \IIOR\IOR)

  • Future of Creative Discovery

    Next Generation of Creative Tools for Analysis• Visual and Interactive• Multi modal

    • Physics and Math engine by design

    • Full integration of ML

    KAREN ROSS Secretary of the California Department of Food and

    Agriculture

    Rick Perry Explores Tomography Data

    Spring 2018

  • Thanks

  • "Any opinions, findings, conclusions or recommendations

    expressed in this material are those of the author(s) and do not

    necessarily reflect the views of the Networking and Information

    Technology Research and Development Program."

    The Networking and Information Technology Research and Development

    (NITRD) Program

    Mailing Address: NCO/NITRD, 2415 Eisenhower Avenue, Alexandria, VA 22314

    Physical Address: 490 L'Enfant Plaza SW, Suite 8001, Washington, DC 20024, USA Tel: 202-459-9674,

    Fax: 202-459-9673, Email: [email protected], Website: https://www.nitrd.gov

    (it,NITRD -

    mailto:[email protected]://www.nitrd.gov/