addressing the data challenges at the advanced light source › nitrdgroups › images › f › ff...
TRANSCRIPT
-
Addressing the Data Challenges at the Advanced Light Source
Alexander HexemerSenior Staff Scientist Program Lead for Computing Advanced Light SourceCenter for Advanced Mathematics for Energy Research Applications (CAMERA)Lawrence Berkeley National Lab
ADVANCED LIGHT SOURCE
-
We can’t do it alone!!!
-
Acknowledgment Lawrence Berkeley National Lab
• Ronald J Pandolfi
• Dinesh Kumar
• Dylan McReynolds
• Singanallur Venkatakrishnan (now ORNL)
• Luis Barrosso-Luque
• Holden Parks
• Austin Blair
• Dilworth Parkinson
• Shuai Liu
• Nathan Melton
• Andrew Wiedlea
• Debbie Bard
• Hari Krishnan
• Krishna Muriki
• Dani Ushizima
• James A Sethian
3
Funding Acknowledgments
• LBNL LDRD Program
• DOE Early Career Award
• CAMERA
Argonne National Lab
• Zhang Jiang
• Doga Gursoy
• Francesco De Carlo
• Xianghui Xiao
• Ian Foster
• Nicholas Schwarz
• Ryan Chard
SLAC
Amanda Fournier
Fang Ren
Yury Kolotovsky
Apurva Mehta
Chris Tassone
Amedeo Perazzo
Brookhaven National Lab
Masafumi Fukuto
Kevin Yaeger
Thomas Caswell
Stuart Campbell
''
BROOKHAVEN '.'\J /\ l I O f\ ,:\ I I ,\ B O R .-\ I O R Y
,~ ALS ~Jf ADVANCED LIGHT SOURCE
~ ~10
BERKELEY LAB SCIENCE-IT
GitHub~ El Read the Docs
-
Center for Advanced Mathematics for Energy Research Applications
Include latest theory and math, take advantage of latest architecture:
Multi CPU/GPU, open source, everything shared, many collaborationswww.camera.lbl.gov
4
DOE: BES and ASCRRenewed last year
Autonomous Data Collection
Machine Learning
Autonomous Data Collection
C011nKled
Ma, P«ll(l c.or.. (USI Mli,-c,ol[2 c.on.1su1 u) · 01 -- ·~ '""I u•uxw "K.wxza
!16X'lliXM
Machine Learning
Nanocrystallography
Electronic Structure
Fluctuation Scattering
Material Informatics
Ptychography
GISAXS
Image Analysis
Jamie Sethian Head of CAMERA
http://www.camera.lbl.gov/
-
Challenges for User facilities
• >20% new users
• Provide very fast feedback and/or experiment combining data from different modalities
• Applying custom workflow to many data sets
• Need for new math and algorithms
• Make things easy and faster for users
• Data access across facilities
• Large amounts of data
0
5
10
15
20
25
30
35
20
18
20
19
20
20
20
21
20
22
20
23
20
24
20
25
20
26
20
27
20
28
PB
/yea
r
Complex Raw Data (prediction ALS)
ADVANCED LIGHT SOURCE
-
New Sources are on the Horizon ADVANCED LIGHT SOURCE
C\J.-, 1022 ,:, ro ~
~ 1015 E E 1021 CD 0
' E 0 ,--~ 0
~ 1020 u Cl) 0 ~ 1014 ,--0 a.. ......... ~ 1019 4" X :::, \ en LL - J .c APS a.. - \ / C
' I .........
~ 1013 ~ 1018 \ \ Cl) ~ (l)
\ .c C 0 -.c 0 0) \ ~ 1017
I 1012
soft x-rays I soft x-rays 1016
101 102 103 104 105 101 102 103 104 105
Photon Energy [eV] Photon Energy [eV]
-
• Development of the standards for data storage and file formats
• Create reports and outline ways for the light sources to work together.
• Explore the topic of real time computing at the 5 light sources and into the
future.
• Data access across facilities
• Regular meetings online and in person.
The Light Source Data Working Group The purpose of the DWG is to serve as a resource for the 5 BES light source Facility Directors to call
upon to provide information and recommendations on working together in the areas of data and
computing.
ALS (Alexander Hexemer), APS (Nicholas Schwarz) LCLS II (Amedeo Perazzo), NSLS II (Stuart Campbell)
-
• NERSC (Debbie Bard et al.): HPC API specs are currently developed, GPGPUs
• Lab science IT (Andrew Wiedlea et al.): GPGPUs, data storage, infrastructure, Mongo and many other ideas
• ESNet (Kate Mace et al.): Network challenges
• Data Acquisition Group (Kevan Anderson et al.): Data and Metadata access (NEXUS), development of a python/scripting interface to allow for analysis driven control of the beamline
• CAMERA (Jamie Sethian et al.): Analysis, Xi-CAM, cam link, HPC API, sharp and many more
Development of the Data Movement in Collaboration with other Divisions of LBNL ADVANCED LIGHT SOURCE
-
GIWAXS scattering on printed OPVs
Design Model
9
-
10
Full data lifecycle: Xi-CAM-Link [Data]
remote
event loopframe
grabber
exp.
control
remote
event loop
process
images
ptycho
SHARP
Compute clusterExperiment /
data acquisition
Execution steps
1.Identify and setup resources
2.Launch services
3.Connect network
4.Execute graph
Remote Storage/Tape Archive• Compute• Publishing• Archiving
Edge Services –DataBroker: Event Based• Metadata• Provenance• Data TrackingMongoDB• Workflow State• Data Cataloging• Data Tagging• LoggingCAM-Mart - Algorithm Marketplace
Local StorageQC, Triage, Local Analysis (FFB)
BROOKH \J ,.\ I I Or,,; ,:\ I AVEN , l ,\ I\ ( l [{ ,.\ I ( l Ry
-
11
Full data lifecycle: Xi-CAM-Link [Analysis]
Graphical User Interface
(GUI)
remote
event loopframe
grabber
exp.
contro
l
remote
event loop
process
images
ptych
o
SHAR
P
User / local computer
Compute clusterExperiment /
data acquisition
Execution steps
master
event loop
remot
e
contro
l
elog
1.Identify and setup resources
2.Launch services
3.Connect network
4.Execute graph
CAM-Link
• Compression• Preprocessing• Analysis close to detector
Python-based• Perform Queries,
Searches• Programmable
Pipelines• Visualization
Python-based• Remote Workers• Support
Docker/Singularity (WF Provenance)
. /11!i1/ ' -- ,., .. l I ~ - , __ ... - , . """ '! .i ;. ~ - (~-
~-- .-;.,: ~i. . , : .... " ' ...... ,. - . .. . ' 0 .
ca
-
Deployment of Mathematical Algorithms: Xi-cam
Scientific Achievement
Development of a community-maintainable
platform for new analysis and visualization
techniques for synchrotrons.
Research Details
–Remote processing with HPC for real-time
high data rate analysis
–remote data access for high-volume data
retrieval
–Highly interactive design
TomographyTime resolved SAXSElectron MicroscopyNEXAFSiPythonGlobus OnlineDatabroker
Remote ComputingGISAXS HIPGISAXS GIWAXS simulatorReverse Monte CarloCD-GISAXSPlug-in Store
Xi-CAM plug-ins
BNL SMI Beamline
Result of DOE Early Career Awardcam
APS 2-BM APS 8-1D-E BNL 11-BM
Batch H1pGISAXS I Log I llmel1ne I Tomography View er I 30 View er
= ~
vuo:n 2m 00101.edf Yll031-2m -00108.edf YllOJl - 2m - 00109.edf YL1031-2m-00110.edf 'r'L1031-2m-00111.edf YL1031-2m-00112.edf Yl1031-2m-00113.edf Yll031=2m~00114.edf
/ 0 1-ccU--',lllJ.!. q-11 lllh 1\ q.,.. \J OUh A (I ----c(l 111
~ A+ B A · B
Xi-cam: A versatile interface for data visualization and analysis Ronald J. Pandolfi/* Daniel B. Allan/ Elke Arenholz;1 Luis Barroso•Luque/ Stuart 1. Campbell / Thomas A. Caswell/ Austin Blair/ Francesco De Carlo,C Sean Fackler/ Amanda P. Foumier,b Guillaume Freychet/ Masa• fumi Fukuto/' Doga Giirsoy,C·h Zhang Jiang/ Harinarayan Krishnan/ Dinesh Kumar/ R. Joseph Kline,& Ruipeng Li/ Christopher Liman,s Stefano Marchesini,' Apurva Mehta," Alpha T. N' Diaye,' Dilworth (Dula) Y. Parkinson/ Holden Parks/ Lenson A. Pellouchoud,a Talita Perciano/ Fang Ren,b Shreya Sahoo/ Joseph Stnalka,C Daniel Sunday,s Christo-pher J. Tassone,;a Daniela Ushizima,;a SinganallurVenkatakrishna.n,d Kevin G. Yager/ James A. Sethian;a and Alexander Hexeme~
H1pRMC I !Python
Calibrant ~teri.! t,,;~.'
A:1 ,\.I rr
-
Scientific Aims
Efficient platform for rapid
parallel screening and materials
discovery.
Data-driven feedback directly
drives material synthesis in-situ.
Significance and Impact
Prototype system at
GIWAXS/XRD beamline at
SSRL for development.
Material properties and
characterizations computed on-
the-fly; ~1 second per sample.
On-the-fly Data Assessment for High-Throughput XRD
http://pubs.acs.org/doi/pdf/10.1021/acscombsci.7b00015
Collaboration with C. Tassone (SSRL) and M. Apurva (SSRL)
-
14
Remote Execution
GISAXS Workshop in Bayreuth in collaboration with CAMERA
~ UNIVERSITAT l[Lj BAYREUTH GISAS
Summer School
2018
I 1 I ~IIL ~~~ , ,,, I
,,
,,
.l ••••
File Plug1ns Test:n9 Active Sess ion (loca lhost)
Batch I H1pGISAXS I !Python I L□Q I limel1ne I Tom□Qraphy I Viewer I 30 Viewer
Welcome ta Xi-cam
© 0 / • i5 bin • ii boot • ii dev • ii etc • ii home • iim lib
~ A+ B A · B A / 8 AVG
-
The High-throughput NEXAFS workflowcollaboration with Materials Project
15
bl 6.3.1
Materials Project website
Quantities
Processing/reduction
AnalysisResults
Public
Xi-CAM
soon
MPCantribs Framework
! ::I
,.
V
Co
i I cam
T Harooss111g tho powlll' of suporcompuung ond stnto-of tho ort Dklctton
he suue1,mm01hod,th0Matcoa•P,01ectp,0"""-·•·ba""' < M t • I to computod information on known oNt p,edictcd m.itonnts as wotl ::cess
a er1a S powerfulanat-,sistoo1Sto1nsl)lreanddesioo-., Project ..-;;;;:=, ·~,.,m"'""' ~ io .-. ... nu
JO~OU IIIAIUIAU -----·--·--
0 ~~ou•u1HIU
,.. __ ----•Gil __ .. ---
.. ~ '115!,lt.l.lU
•••t,un
-
Deep learning for X-ray Scattering
Ushizima, Kumar, Venkatakrishnan, Hexemer. “At the convergence of GISAXS and deep learning”, IX CANSAS Conference, Berkeley and Menlo Park, CA, Jun 2017.Ushizima, “High Throughput Reverse Image Search: Quantification, Search, Retrieval and Ranking for Multi-modal Imaging”, BNL NSLS User Meeting, NY, May 2017.
Queries Best matches
Input : 2D image of cristal diffraction (cubic, hexagonal … packing)
Example 1: Classify nanostructures by packing
Generate a model which predict with 80% accuracy the different space groups
Example 2: Classify between SAXS/GISAXS data
Example 3: Using Fourier transform to improve learning
Reduce image size for classification in Fourier space to limit the lost of information
1024 * 1024128 * 12816* 16
Truncate frequencies in the Fourrier space to retain more information
Automatize the data classification on beamlines (7.3.3 beamline)
Generate a model which predict with 98% accuracy the different space groups
Collaboration K.Yager and M. Fukuto (NSLS2) Rippel et al. (2015)
Method Original Data Set
Noisy Data
Convolutional Neural Networks
92.3% 91.6%
Histogram of Oriented Gradients
92% 80%
50
100
150
50 100 150
Max pooling
Spectral pooling
-
Use Case in Development: X-ray Photon Correlation Spectroscopy (XPCS)
M. Sutton, C. R. Physique, 9, 657 (2008).
G. Grübel and F. Zontone, J. Alloys Compd. 362, 3 (2004).
The g2 curve has time information at the specific q-value (length scale).
t
Time delay
Corr
ela
tion
Speckle motion at length scale dn of sample.
New Beamline development
Interface: Xi-CAM
HPC Code for XPCS
Control: Bluesky and DataBroker
. .. -3 . .. . . . .. . .. .. .. ' .t +:
.... . . . . . . . ... . . . . ,, . . .. •I . . . ... o •• . . .. .. ., . . . .. . . . . . . .. . .
1.15 r--~--g-,-:-(q~,,--;-)- =-·. 1
i. " I' 1M
(l(q,t )l(q,t H ))
(I(q,1) )2 ... 10' 10' frame delay
10'
' I I . \; I ~ - t I . r • . \ , .
( \ . \ , •· -I' ; ' ~ , ·, -r ~"'
.• • \.> . .Ir
ADVANCED LIGHT SOURCE
BROOKHAVEN . 'J\1 101\'\ I I \IIOR\IOR)
-
Future of Creative Discovery
Next Generation of Creative Tools for Analysis• Visual and Interactive• Multi modal
• Physics and Math engine by design
• Full integration of ML
KAREN ROSS Secretary of the California Department of Food and
Agriculture
Rick Perry Explores Tomography Data
Spring 2018
-
Thanks
-
"Any opinions, findings, conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the Networking and Information
Technology Research and Development Program."
The Networking and Information Technology Research and Development
(NITRD) Program
Mailing Address: NCO/NITRD, 2415 Eisenhower Avenue, Alexandria, VA 22314
Physical Address: 490 L'Enfant Plaza SW, Suite 8001, Washington, DC 20024, USA Tel: 202-459-9674,
Fax: 202-459-9673, Email: [email protected], Website: https://www.nitrd.gov
(it,NITRD -
mailto:[email protected]://www.nitrd.gov/