1 towards an upgrade tdr: lhcb computing workshop 18-22 may 2015 introduction to upgrade computing...
TRANSCRIPT
1
Towards an Upgrade TDR:
LHCb Computing workshop 18-22 May 2015
Introduction to Upgrade Computing Session
Peter Clarke Many peoples ideas Vava, Conor, Mike Patrick, Marco, Concezio...
2
Process
Discussion document Purpose to start the “slow start” process (TCP) of getting people to think and
contribute It is understood that it is hard to get attention at the beginning of Run-II It is understood that there is always a danger of starting too early
//svn.cern.ch/reps/lhcbdocs/Users/pclarke/notes/Discussion-TowardsComputungTDR
LHCb discussion list : [email protected]
This meeting
Organise that R&D happens in good time to produce reports in 2016 and 2017
Benefit from experience of Run-II
Write TDR
3
Schedule
Proposed schedule:
Q4 2015: Roadmap (of how we are going to get to TDR) Q4 2015
Q4 2016: Published notes on 2015/2016 experience
Q2 2017 R&D results written up
Q4 2017 Computing TDR submitted
Q4 2018 Computing model finalised
4
Questions to answer before we can write TDR
Lots of thoughts on ~ real time processing and data reduction
Requires change of mindset Requires lots of testing
Trigger rate
Bandwidth Event size Rate See next talks
Storage
What do we keep as RAW data – how much can it be reduced ? Replica policy & Development of Intelligent data management todays talks MDST MC storage or regeneration
CPU
CPU is 80% for MC FastMC Optimisation See tuesdays talks
5
Questions(2)
Distributed computing hierarchy
Flatten hierarchy concentrate at fewer high efficiency centres The answer is as much political as technical (regional funding willingness to pay for
resources in one place). So maybe it is out of our hands
De-couple tape services from Tier-1 Use few high cost-efficiency tape vaults somewhere (at least for archive) Possibly commercial
Review of main applications Designed for mu of 0.4 and adapted ever since. Is there a “new branch” development (break with past) Tuesday discucssion
Data preservation Scale of task will increase Obligations will increase We may even have our first requests by then as it is greater than 5 years !
6
Questions(3): People model
Current people model for operations is
CERN centric pressure on limited number of CERN based people Clearly it worked for Run-I and presumably will for Run-II But it is too reliant upon good will of small set or people
Must it be this way ? Is this what we want in Run-III
We may try to de-centralise Pass specific responsibilities to external groups Of course this has been tried before with limited success, but it is perhaps worth trying again. We should survey the other experiments.
7
Immediate needs
All of these things (and more) need investigation
Work between now and ~ Q2 2017 Experience from Run-II
Some changes will be breaking changes
Need to plan when technical break occurs Need to plan when psychological break occurs for collaboration members
Of course many of these areas are on-going work and its is important to recognise and appreciate this
Nevertheless I am personally convinced that it is essential to identify people to be “project managers” responsible for at least some of these topics
Writing down the work needed timeline planning report production
Of course I am not oblivious to the discussions earlier in the week on chronic lack of suitable effort Maybe we need to divert people from Run-II anyway ! This may then have a detrimental effect on ability to deal with Run-II data but we may simply
not have the choice if we are to be ready for 2020 !
8
Immediate needs
If there were team of people available in an institute then we would do this properly with project managers now. There are not
An upgrade detector has a dedicated set of engineers which are disjoint from running LHCb Run-II
In computing its the same people
Some things are in the category that they will go on anyway E.g. Data Popularity (Intelligent Data Management) Todays talk Probably fine to expect progress and a report ready for TDR
Some things may be ok to only start after the TDR in LS2 (when effort becomes available) Developing means to regenerate MC (and not store it)
But some things are just like a detector and need serious planned programme starting now Event oriented processing Full tests of highly reduced data sets. FastMC ?
It is this last category which we do not have the >50% FTE project managers for Will only happen as a secondary/best efforts task May become too late for 2020
9
Immediate needs
Efforts need to be re-doubled to educate our collaboration members
Software is no more or less a professional task than building a detector
If the effort is not found – things will not be done – and Run-III will suffer
Group Leaders must be asked again (futile as it may be) to consider prioritising people in their groups to do software tasks.
Means need to be found to hire the missing project managers (or displacing existing people) by charging institutes or charging M&O otherwise
Continue all efforts to ensure software work is appreciated [this is complex]
Management needs to support this position so that it is not just Marco and I pleading.
10
Immediate needs
Efforts need to be re-doubled to educate our collaboration members
Software is no more or less a professional task than building a detector
If the effort is not found – things will not be done – and Run-III will suffer
Group Leaders must be asked again (futile as it may be) to consider prioritising people in their groups to do software tasks.
Means need to be found to hire the missing project managers (or displacing existing people) by charging institutes or charging M&O otherwise
Continue all efforts to ensure software work is appreciated [this is complex]
Management needs to support this position – so its not just Marco and I pleading
LHCb Collaboration
meeting Photo, 2019