finding the “higgs” in the haystacks trends finding the "higgs" in the haystack(s) ......
TRANSCRIPT
![Page 1: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/1.jpg)
FINDING THE “HIGGS” IN THE HAYSTACK(S)
Stephen J. Gowdy (CERN) 12th September 2012 XLDB Conference
![Page 2: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/2.jpg)
Overview
Large Hadron Collider (LHC)
Compact Muon Solenoid (CMS) experiment
The Challenge
Worldwide LHC Computing Grid (wLCG)
Data Organisation
Analysis Techniques
Databases
Future Trends
12th September 2012 Finding the "Higgs" in the Haystack(s) 2
![Page 3: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/3.jpg)
a hadron is a composite particle made of quarks
Large Hadron Collider
12th September 2012 Finding the "Higgs" in the Haystack(s) 3
![Page 4: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/4.jpg)
Big machine characteristics
17 mile circular tunnel, 100m underground, straddling the French-Swiss border
Protons currently travel at 99.9999964% of the speed of light
Each proton enters CH over 11,000 times in a second
Will not reach design beam energy till 2014
Interactions potentially every 25ns (40MHz)
Each interaction has multiple collisions
Call “pileup”, currently around 30 collisions per event
12th September 2012 Finding the "Higgs" in the Haystack(s) 4
![Page 5: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/5.jpg)
Accelerator Complex
Older machines feed newer machines
LHC Protons start in LINAC2 then go to the PS via the BOOSTER
From the PS they are injected to the SPS
Injected to LHC at 450GeV Accelerated to 4TeV in
LHC
Need to have “fills” ~1/day
12th September 2012 Finding the "Higgs" in the Haystack(s) 5
![Page 6: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/6.jpg)
LHC
CMS
CERN Main Site
12th September 2012 Finding the "Higgs" in the Haystack(s) 6
SPS
![Page 7: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/7.jpg)
a muon is a (comparatively) long lived big brother to the electron
Compact Muon Spectrometer
12th September 2012 Finding the "Higgs" in the Haystack(s) 7
![Page 8: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/8.jpg)
12th September 2012 Finding the "Higgs" in the Haystack(s) 8
![Page 9: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/9.jpg)
Particle Identification 101
12th September 2012 Finding the "Higgs" in the Haystack(s) 9
![Page 10: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/10.jpg)
12th September 2012 Finding the "Higgs" in the Haystack(s) 10
![Page 11: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/11.jpg)
Trigger Architecture
12th September 2012 Finding the "Higgs" in the Haystack(s) 11
Matching “Trigger Towers” ECAL, HCAL:
ET(dd
Electron Isolation,
Jet detection
Sorting
ETmiss
ETtot
0.8 < || < 2.4 || < 1.2 || < 2.1
for Endcap and Barrel:
pT, , , quality
Track segments
endcap and barrel
≤ 4 candidates
Final decision, partitioning
Interface to TTC, TTS (Trigger throttling system)
![Page 12: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/12.jpg)
Data Rates
RAW (ie unprocessed) data is about ~1MB/ev
Potential detector acquisition rate 1MB * 40MHz = 40TB/s
Actual data is much larger but all detectors not able to readout at 40MHz
Hardware trigger decision allows 100kHz rate Looks at individual detectors to make a fast choice
Data rate up to 100GB/s
High Level Trigger done on filter farm Output rate is nominally 300Hz ~= 300MB/s
12th September 2012 Finding the "Higgs" in the Haystack(s) 12
![Page 13: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/13.jpg)
why it isn’t easy
12th September 2012 Finding the "Higgs" in the Haystack(s) 13
The Challenge
![Page 14: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/14.jpg)
A “Higgs” event
12th September 2012 Finding the "Higgs" in the Haystack(s) 14
![Page 15: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/15.jpg)
A Haystack
12th September 2012 Finding the "Higgs" in the Haystack(s) 15
40 reconstructed vertices High PileUp run 25th October 2011
![Page 16: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/16.jpg)
Haystacks
So that was one event
2012 average is 30 collisions per event
By the end of 2012 will have almost 7 billion events recorded
After the reduction of 40MHz to O(300Hz)
Doesn’t include simulated data
Looking for a half million Higgs particles
Assuming predicted cross sections are correct
Many are much much harder to find than 4 muons
12th September 2012 Finding the "Higgs" in the Haystack(s) 16
![Page 17: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/17.jpg)
like an electric grid that supplies computing power
12th September 2012 Finding the "Higgs" in the Haystack(s) 17
Worldwide LHC Computing Grid (wLCG)
![Page 18: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/18.jpg)
Tiered System
Tier-0 at CERN Data gets “sorted” and its first pass reconstruction
Tier-1 centres CMS has seven, large regional facilities
Provide custodial tape storage
Large scale re-reconstruction
Tier-2 centres Frequently universities or groups of universities
Simulation
End user analysis
12th September 2012 Finding the "Higgs" in the Haystack(s) 18
![Page 19: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/19.jpg)
Schematic
12th September 2012 Finding the "Higgs" in the Haystack(s) 19
CERN
Fermilab IN2P3 ASGC KIT CNAF
Florida UCSD
Tier-0
Tier-2
Tier-1
Tier-3
CMS Detector
Filter Farm
UCLA MyLaptop
![Page 20: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/20.jpg)
LHCOPN (Optical Private Network)
12th September 2012 Finding the "Higgs" in the Haystack(s) 20
CMS is green
Traf
fic
on
a C
ER
N H
olid
ay
![Page 21: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/21.jpg)
Resources
12th September 2012 Finding the "Higgs" in the Haystack(s) 21
Tier-0 121 21%
Tier-1 137 23%
Tier-2 324 56%
CPU (kHS06) 582kHS06~=150kSi2k
Tier-0 4800 9%
Tier-1 21000 40%
Tier-2 27000 51%
Disk (TB) 51800TB
Tier-0 23000 33%
Tier-1 47000 67%
Tape (TB) 90000TB
![Page 22: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/22.jpg)
lining up the bytes in a consumable order
12th September 2012 Finding the "Higgs" in the Haystack(s) 22
Data Organisation
![Page 23: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/23.jpg)
Data Tiers
“Streamer” files written to disk by filter farm
Read and reorganised into Primary Datasets (PD)
Based on trigger selections (physics motivation)
Output is the custodial RAW data
Reconstruction run on RAW PDs
Output RECO and AOD (Analysis Object Data)
Simulation also produces similar data tiers plus truth information
12th September 2012 Finding the "Higgs" in the Haystack(s) 23
![Page 24: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/24.jpg)
Data Ordering
ROOT used as persistency framework
Depending on expected reading pattern adjust ordering of data in files
RAW & RECO expected to read whole event
Ordering in file is by event
AOD could have subset of data read
Pass frequently over a single variable making plots
12th September 2012 Finding the "Higgs" in the Haystack(s) 24
Attribute 1
Attribute 4
1 2 3 … n
1 2 3 n
… 1 2 3 n
…
![Page 25: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/25.jpg)
Skims
Train model like event selection
Various analysis include their event selection
Selection done using reco output
More detailed and accurate than trigger info
Can cut a lot harder
First skims done at Tier-1 on the Tier-0 output
Called PromptSkims as it is started ASAP
Currently write out 81 datasets from Tier-0 output
12th September 2012 Finding the "Higgs" in the Haystack(s) 25
![Page 26: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/26.jpg)
Datasets
Files are collected in datasets
Datasets should be processed together
This actually uses a database (Oracle)
Each dataset has provenance attached to it
Can be superseded by a reprocessing
End user tool queries database and creates jobs to process it
Typically across all the Tier-2s hosting the dataset
12th September 2012 Finding the "Higgs" in the Haystack(s) 26
![Page 27: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/27.jpg)
narrowing the haystacks
12th September 2012 Finding the "Higgs" in the Haystack(s) 27
Analysis Techniques
![Page 28: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/28.jpg)
Discriminating Variables
Each analysis will find the variables that enhance their signal to noise ratio High energy muon is an easy
one i.e. something going really
fast doesn’t bend so much in the magnetic field
May end up loosing a lot of signal to reduce the background by a larger factor Optimise S/√B or S/ √ (S+B)
12th September 2012 Finding the "Higgs" in the Haystack(s) 28
0
10
20
30
40
50
60
Momentum of muon (GeV)
Pseudo Data
Background Signal
![Page 29: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/29.jpg)
Multivariate Analysis
Many different types Simple rectangular cuts (multiple 1-d cuts)
Maximum Likelihood approaches Combine the probability of all input variables
Fisher Discriminants Input variables are projected to another space to
avoid correlations
Neural Networks
Most of these methods rely on training
Some packages can apply many methods
12th September 2012 Finding the "Higgs" in the Haystack(s) 29
![Page 30: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/30.jpg)
TMVA (Toolkit for MVA in ROOT)
12th September 2012 Finding the "Higgs" in the Haystack(s) 30
![Page 31: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/31.jpg)
New Boson Plot
H -> ZZ -> llll
Use five angles and two masses as discriminators
12th September 2012 Finding the "Higgs" in the Haystack(s) 31
![Page 32: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/32.jpg)
not xldbs though
12th September 2012 Finding the "Higgs" in the Haystack(s) 32
Databases
![Page 33: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/33.jpg)
Conditions Database
Largest database use (not in size, ~300GB)
Provides calibration, geometry and alignment information
Used by all running jobs
Can be more than 100k jobs world wide
Network of squid caches used
Database queues transformed into http requests
Home grown technology to achieve this (Frontier)
Works as data is written once, read many
12th September 2012 Finding the "Higgs" in the Haystack(s) 33
![Page 34: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/34.jpg)
12th September 2012 Finding the "Higgs" in the Haystack(s) 34
Squids Aggregate: 500k requests/min
500MB/s
Offline Servers: 4k requests/min
0.5MB/s
![Page 35: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/35.jpg)
Other Databases
PhEDEx : Manages file transfers
Single Oracle instance at CERN
DBS : Dataset Bookkeeping System
Contains meta-data about datasets and files
Main instance in Oracle at CERN
User instances available elsewhere with MySQL
Job tracking databases
Use both Oracle and MySQL
Recent system archiving information in CouchDB
12th September 2012 Finding the "Higgs" in the Haystack(s) 35
![Page 36: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/36.jpg)
Reading Rate
12th September 2012 Finding the "Higgs" in the Haystack(s) 36
6TB/day
250TB/day
![Page 37: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/37.jpg)
…need to wear shades
12th September 2012 Finding the "Higgs" in the Haystack(s) 37
Future Trends
![Page 38: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/38.jpg)
Federated Storage
Aiming towards an architecture where all storage is visible globally
12th September 2012 Finding the "Higgs" in the Haystack(s) 38
User App
Global Redirector
US Redirector EU Redirector
Site A Site B Site C Site D
Open /store/foo
Query /store/foo Query /store/foo
Query /store/foo
/store/foo
Redirect Global
Open /store/foo
US Region EU Region ?? Region
Redirect EU
Redirect Site C
![Page 39: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/39.jpg)
Clouds: for a rainy day
Helix Nebula
European initiative to provide unified system
Shows importance for standards
Proof of concept demonstrated on Amazon
Costs still prohibitively expensive
Estimate order of magnitude
Running our own data centres more cost effective
May be interesting for adding short term capacity
12th September 2012 Finding the "Higgs" in the Haystack(s) 39
![Page 40: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/40.jpg)
Clouds: internal cloud
CERN moving to “agile” infrastructure
Commissioning new data centre in Hungary
Filter farm as cloud during LHC shutdown
Using OpenStack across 15k cores
Allows flexibility for redeployment
Farm also needed for detector work
12th September 2012 Finding the "Higgs" in the Haystack(s) 40
![Page 41: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/41.jpg)
Summary
Database technology used in various roles
Whole size around 10TB: not huge
Our Big Data: 20PB RAW data
CMS uses worldwide computing infrastructure to deliver physics results
We’ve found a needle, now need to figure out what kind it is: http://lanl.arxiv.org/abs/1207.7235
12th September 2012 Finding the "Higgs" in the Haystack(s) 41
![Page 42: Finding the “higgs” in the haystacks Trends Finding the "Higgs" in the Haystack(s) ... lining up the bytes in a consumable order ... Recent system archiving information in CouchDB](https://reader031.vdocuments.site/reader031/viewer/2022030423/5aab9bce7f8b9a8f498c2638/html5/thumbnails/42.jpg)
XLDB Europe 2013 @ CERN
CERN will be happy to host a European Satellite XLDB
Planned date: 25+26 June 2013 During LHC long shutdown, which will allow to
include also discussions on LHC data management issues
We invite everyone to help reaching out to places in Europe with challenging xldb-related issues please contact [email protected] and
12th September 2012 Finding the "Higgs" in the Haystack(s) 42