big data analytics large hadron collider · big data analytics large hadron collider manuel martín...
TRANSCRIPT
BIG DATA ANALYTICS
LARGE HADRON COLLIDER
Manuel Martín Márquez, Senior Project Leader
CERN – IT Department
3
CERN EUROPEAN ORGANIZATION FOR NUCLEAR RESEACH
A WORLDWIDE COLLABORATION
4
FUNDAMENTAL RESEARCH
WHAT IS THE UNIVERSE MADE OF?
HOW DIT IT START?
5
FUNDAMENTAL RESEARCH
Why do particle have mass?
6
FUNDAMENTAL RESEARCH
Why is there no antimatter left in the Universe?
What is 95% of the Universe made of?
7Manuel Martin MarquezIntel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
8
CERN’s PARTICLE ACCELERATORS AND
EXPERIMENTS
12/6/2019 Document reference 9
CERN Aerial View
World’s largest scientific instrument27km (16.8 miles) circumference, 6000+ superconducting magnets
Emptiest place in the solar system High vacuum inside the magnets
Hottest spot in the galaxy
During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun;
Fastest racetrack on Earth
Protons circulate 11245 times/s (99.9999991% the speed of light)
12/6/2019 Document reference 10
150 Million of sensorControl and detection sensors
Massive 3D camera
Capturing million of collisions per second
CMS Detector
11
FUTURE CHALLENGES
HIGH-LUMINOSITY LHC
FUTURE CIRCULAR COLLIDER
12
Post-LHC accelerator projects (80-100 km)
14
CERN’s BIG DATA AND MACHINE LEARNING
HIGH ENERGY PHYSICS
15
16
17
18
TRIGGER SYSTEM – FILTERING EVENTS & DATA
The trigger system selects approximately 1000 of the 1.7
billion collisions that occur each second in the centre of the
ATLAS detector.
19
TRIGGER SYSTEM – FILTERING EVENTS & DATA
Improve efficiency, flexibility and quality filtering Remove the costly and rigid hardware based model
Reduce false positives rates
From rule based systems to Deep Learning Classifiers
20
TRIGGER SYSTEM – ML PIPELINE
Data
IngestionFeature
PreparationModel
Development Training
Complex datasets
801x19 matrix
Files about 4TB
Data format preparation
19 Original Features
14 Derived from Domain
Knowledge (HLF)
Feed-Forward DNN
Recursive DNN - GRUs
Combined Models
Hyper-Parameter tuning
Scikit-learn-Keras-Spark (parallel)
Distributed Training
21
TRIGGER SYSTEM – SCORING
Strict latency constraints Target level 1 trigger
Larger networks with longer latency. neutrino,
astronomical experiments, industrial
applications etc.
FPGAs Provide huge flexibility and allow us to cope
with response time required
Performance depends on how well you take
advantage of it
22
QUANTUM COMPUTING – ANOTHER WEAPON?
Can Quantum Computing and Q-ML help Quantum Nearest Neighbors Clustering, PCA and SVM
But still hard to Get access to emulators and simulators
Get access to real devices, benchmark, compare results
Engineering aspects of QC installation, like cryogenics and material science
Use Cases: Track reconstruction in dense environments
Reconstruct neutrino interactions
Optimize Grid workflow
23
CERN’s BIG DATA AND MACHINE LEARNING
CERN CONTROL SYSTEMS AND IOT
24
CERN ACCELERATOR LOGGING SERVICE
+2M signals produce more than 2.5TB data per day.
From scalars to arrays of up-to 4 million elements.
Data diverse in nature: Accelerator running modes,
Equipment statuses,
Magnet currents,
Cryogenics temperatures,
Particle beam positions
25
CERN ACCELERATOR LOGGING SERVICE
Control I-IoT data at CERN is disperse into several data silos
Current system optimized for real-time serving but not for data
exploration: Find hidden correlations
Anomalies detection
Post-mortem analysis
Root cause analysis (RCA)
Intelligent Alarm systems
Etc
26
ENHANCE EXPLORATION - AUTONOMOUS TECH
Oracle Autonomous StrategyAutomated creation of required resources,
Administration, Patches,
Backups,
Memory handling
Flexibility to provisioning and scaling Easy Solution prototyping
Move fast from PoC to Production states
Cost Effective Solutions Hybrid systems integrating DB and External Object Storage
27
DATA INTERFACE – NOTEBOOKS
The human factor
SWAN
Jupyter notebooks
Integrated with CERN
GPUs – Cloud
Oracle collaboration