data logistics in particle physics ready or not, here it comes… prof. paul sheldon vanderbilt...
TRANSCRIPT
Data Logistics in Particle PhysicsReady or Not, Here it Comes…
Prof. Paul SheldonProf. Paul SheldonVanderbilt UniversityVanderbilt University
Prof. Paul SheldonProf. Paul SheldonVanderbilt UniversityVanderbilt University
Outline
How Strange is the Universe? 5 Modern Mysteries.
In trying to resolve these mysteries, particle physicists face a significant data logistics problem.
Solution should be flexible enough to encourage the creative approaches that will maximize productivity.
REDDnet breaks “data-tethered” compute model, allows unfettered access w/o strong central control.
Is the Universe Even Stranger Than We Have Imagined?
One piece of evidence: rotational velocities of stars in galaxies
Pick a star, how fast is it moving around galactic center?
Mass of galaxy is much, much larger than you get by counting the stars in the galaxy
GMr
v2 1st Year Physics!
We Don’t Know What The Majority of Matter in the Universe Is.
This “extra” matter is 90% of the Universe!
Conventional explanations have mostly been ruled out– Planets, dust, …
Most of the matter in the Universe is probably an exotic form of matter — heretofore unknown!
But there is a good chance particle physicists will make some soon at the LHC at CERN!
~10% normal matter 90% “other” matter
5 Mysteries for a New Millennium
What is the majority of matter in the universe made of?
Does space have more than three dimensions?
Where is all the anti-matter created by the Big Bang?
What is this bizarre thing called “Dark Energy?”
Why do things have mass?
Answering These Questions Presents Many Challenges…
Experiments require significant infrastructure, large collaborations2500 Physicists!
CERN Large Hadron
Collider: 2007 Start CERN Large Hadron
Collider: 2007 Start
27 km tunnel in Switzerland & France(100 m below ground)
CMS
Petascale Computing Required
0
50
100
150
200
250
300
350
2007 2008 2009 2010Year
MS
I200
0
2008: ~50,0008 GHz P4s
CMS will generate Petabytes of data per year and require Petaflops of CPU…
But physics is done in small groups, geographically distributed
Distributed Resources, PeopleWhy Distributed Resources?• Sociology• Politics• Funding
To maximize the quality and rate of To maximize the quality and rate of scientific discovery, all physicists scientific discovery, all physicists must have equal ability to access must have equal ability to access and analyze the experiment's data…and analyze the experiment's data…
CMS Collaboration:CMS Collaboration:>37 Countries, >163 Institutes>37 Countries, >163 Institutes
LHC Data Grid Hierarchy
Tier 1
Online System
CERN Center PBs of Disk;
Tape Robot
FNAL Tier1IN2P3 Tier1 INFN Tier1 RAL Tier1
InstituteInstituteInstitute
Workstations/Laptops
~150-1500 MBs
10 Gbps
1 to 10 Gbps
~PByte/sec
10-40+ Gbps
Tier2 CenterTier2 CenterTier2 Center
1-10 Gbps
Tier 0 +1
Tier 3
Tier 4
Caltech Tier2 Tier 2
Experiment
>10 Tier1 and ~100 Tier2 Centers
UERJ Tier2
Physics data cache
Vanderbilt Tier3 The small Analysis Groups
doing the physics: work at the Tier 3/4 Level.
Data Logistics Yin and Yang
Uncertainty reigns at the most important level — where the physics will get done.
Physicists will evolve novel use cases that will not Physicists will evolve novel use cases that will not jive with expectations or any plans/rules/edicts.jive with expectations or any plans/rules/edicts.
High Level Control
Infrastructure Ready? Tested Use Cases
Tier 0Strong,
CentralizedMost Much Understood
Tier 4 Anarchy Little/None None ?????
Use Cases: What we Do Know
Physicists will: need access to 10-100 TB Data Sets for short term periods.run over this data many times, refining, improving their analysis.use local computing resources where they may not have much storage available.make “opportunistic use” of compute resources at Tier 3 sites and Grid sites. perform “production runs” at Tier 2 sites.
REDDnet at Tier 3
Opportunistic computing vs data-tethered computing– CMS has no formal solution for Tier 3 storage– Compute on resources — even those where data not hosted
On-demand working storage– improve data logistics– Acts local — familiar user tools
Demonstrate at a Tier 3– Performance– Reliability– … and convenience
REDDnet
SC06 Depots
Near Term Plan of Work
Provide T3 scratch spaceHost/mirror popular datasets on REDDnetParticipate in Data and Service Challenges– Summer 07 Challenge Starting Soon– Network and Data Transfer Load tests
Integrate with existing CMS toolsDevelop a Tier 3 Analysis environment– Initial small test community– Test with individual analyses– Run on the Grid