chep 2003 general summary

40
CHEP 2003 General Summary Torre Wenaus, BNL/CERN CHEP 2003, UC San Diego, La Jolla March 28, 2003

Upload: porter-holden

Post on 31-Dec-2015

37 views

Category:

Documents


1 download

DESCRIPTION

CHEP 2003 General Summary. Torre Wenaus, BNL/CERN CHEP 2003, UC San Diego, La Jolla March 28, 2003. I agree with all the other summaries. Thank you to the organizers, and have a safe journey home. Outline: The CHEP03 Zeitgeist. Themes and observations Rising trends Important developments - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CHEP 2003 General Summary

CHEP 2003 General Summary

Torre Wenaus, BNL/CERN

CHEP 2003, UC San Diego, La Jolla

March 28, 2003

Page 2: CHEP 2003 General Summary

I agree with all the other summaries.

Thank you to the organizers,and have a safe journey home

Page 3: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 3

Torre Wenaus, BNL/CERN

Outline: The CHEP03 Zeitgeist

Themes and observations Rising trends Important developments Receding trends Underrepresented Open questions Concerns Major challenges Conclusions Thanks

Google zeitgeist: http://www.google.com/press/zeitgeist.html

zeit·geist | Pronunciation: 'tsIt-"gIst, 'zIt | Function: noun | Etymology: German,

from Zeit (time) + Geist (spirit) |Date: 1884 | Meaning: the general

intellectual, moral, and cultural climate of an era

Page 4: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 4

Torre Wenaus, BNL/CERN

Themes and observations

Lesson from the past: Make it simple (R. Brun) No more complex than necessary Users want consolidation, ease of use, and stability Must consider also needs of the future; longer view of maintainability and

evolution In the interests of long term stability

OO and C++ is the accepted paradigm No major OO/C++ migration or usage angst at this conference, it is done and

accepted Offline and online: “Triumph of C++ for HEP DAQ confirmed” – DAQ

summary Now we are hearing reports on Nth generation C++ software

L. Sexton Kennedy, CDF: Every component has been rewritten at least once. Implementations have now stabilized such that every new arrival doesn’t start by discarding and rewriting software

“Many more talks about redesign than about design” – Data management summary

And on the maturation and emergence of tools as broad standards, after years of development and refinement

e.g. Geant4, ROOT I/O

Page 5: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 5

Torre Wenaus, BNL/CERN

Themes and observations

The tyranny of Moore’s Law Wolbers: it is not a substitute for more efficient & faster code,

smaller data size it works against thinking before doing Optimize wherever possible

Addressing the digital divide in networking (H.Newman) HEP is obligated as a community to work on this A world problem in which our field can have visible impact

Farm challenges Don’t underestimate farm installation and operations (R.Divia) Big issues are power, cooling, space! (S.Wolbers)

Watts/$ steadily rising (R.Mount) Tape-disk random access performance gap in analysis is receding

as an issue, but disk-memory gap is hardly being addressed (R.Mount)

Page 6: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 6

Torre Wenaus, BNL/CERN

Random-Access Storage Performance

0.000000001

0.00000001

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

100

1000

log10 (Obect Size Bytes)

Ret

reiv

al R

ate

Mb

ytes

/s

PC2100

WD200GB

STK9940B

0 1 2 3 4 5 6 7 8 9 10

R. Mount

Page 7: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 7

Torre Wenaus, BNL/CERN

Rising trends

ROOT For analysis, I/O, and much else Now fully supported at CERN: EP/SFT section Close interaction with experiments on new developments

Run II, RHIC, ALICE, LCG, BaBar, … Foreign classes, PROOF, geometry, grid integration, …

Mentioned in 47+ talks at this conference Open source databases (MySQL, Postgres, …)

Metadata, distributed computing, conditions, … Empowering software: easy and potent MySQL mentioned in 37 talks! Postgres in 8, Oracle in 27

Online – offline continuum Similar Linux farm environments, attainable time budgets Same framework, maybe same algorithms, in HLT as in offline

(V.Boisvert, ATLAS) Stringent performance/robustness requirements on software

Page 8: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 8

Torre Wenaus, BNL/CERN

Rising trends

Common projects Joint projects one of the CDF/D0 successes (Wolbers)

But hard to align running experiments with LHC LHC Computing Grid project Grid projects in general Laudable but difficult; increasingly forced by the circumstances

Resource constraints and increasing scale and complexity makes go-it-alone N times too costly

cf. comments in online/DAQ context by G. Dubois-Feldmann today: somewhat less success in online where it is even harder than offline, but possible LHC inroads

Related is software reuse… Respect what we know about long software development

timescales

Page 9: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 9

Torre Wenaus, BNL/CERN

Rene’s time to develop plot

LCG?

Page 10: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 10

Torre Wenaus, BNL/CERN

LCG must effectively re-use and leverage existing software, or fail

LCG?

This is the approach taken: cf. POOL, SEAL talks. Time will tell! cf. next CHEP

Page 11: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 11

Torre Wenaus, BNL/CERN

Rising trends

Modular component architectures Many examples in offline; also in online/DAQ (XDAQ – CMS) [also in open

source…] Associated infrastructure: white boards, centrality of dictionary, plug-ins, …

XML The no-brainer for small scale structured data storage and exchange.

[The more humane applications leave the XML generation to the computer and not the humans]

ASCII lovers [count me in] now have their standard Many talks in many areas involving XML applications

Detector description, conditions info, configuration, monitoring, graphics, object models, data/object interchange, dictionary generation, not to mention layered apps (e.g. SOAP)…

37 talks mention XML (same 37 as MySQL?) But XML in itself does not define common format/schema, and much divergence

and duplication exists in how XML is used e.g. detector description

We heard (I.Foster) about an OGSA community clearing house, we have similar things ourselves (CLHEP, FreeHEP), maybe we need one for XML applications

Page 12: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 12

Torre Wenaus, BNL/CERN

Rising trends

Open source in general “Open source, please. Your interests rarely in commercial

vendor’s interests” (M.Purschke, PHENIX) In the CDF/D0 success column, similarly all over DBs, Qt, utility libraries, … and Linux, it goes without saying Extraordinary capability and quality

Java, to a degree Important limitations being addressed, e.g. manageable C++

interoperability (JACE autogeneration of interface) JAS, NLC sw, IceCube, CDMS DAQ, … But not broadly competing with C++ in usage so far

HENP as CS partner and collaborator To our mutual benefit in the Grid and in networking

Page 13: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 13

Torre Wenaus, BNL/CERN

Rising trends

“New” simulation engines: Geant4, FLUKA Geant4 as a production tool

In production in BaBar: EM validation in hand, hadronic beginning, robust and reasonably fast

ATLAS on the way to completing G4 transition after two years of physics validation

CMS, LHCb also transitioning over next year GLAST using LHCb/Gaudi Geant4 interface

FLUKA not new – established and widely used – but new integration efforts as a detector simulation engine for the four LHC experiments

FLUGG interface to G4 geometry ALICE Virtual Monte Carlo as uniform interface to multiple engines

(FLUKA, Geant4, Geant3) Interest from other experiments; joint LCG project starting Used for Geant4 testing FLUKA integration in progress

Page 14: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 14

Torre Wenaus, BNL/CERN

Rising trends

Automation in software development/management Heard about several automated tools for code building and

testing, release integration & tag management, configuration management

Popular new software web portal at CERN LCG/SPI http://savannah.cern.ch

Automated textual and statistical analysis of test outputs

Page 15: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 15

Torre Wenaus, BNL/CERN

Rising trends – The Grid

The central importance of distributed computing to future (increasingly, present) HENP is long known

‘The Grid’ as the means to that is now established Major, broad successes in funding and in attracting

collaboration with CS F.Berman, Grid 2003: “HEP has set a model for

integration, focus, coordination” Progress in applying Grid software and infrastructure

to real problems Batch production

Clearly the chosen path; success to be proven, but has promise and broad commitment

Page 16: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 16

Torre Wenaus, BNL/CERN

The Grid

F.Berman, Grids on the horizon: Must be useful, usable, stable; supported More cooperative than competitive

[Not always the case today!] Applications are key to success

Not a “Field of Dreams” “build it and they will come” R&D field any more

Grid killer app: a focus on data. Good match to us Still a long way to go

Page 17: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 17

Torre Wenaus, BNL/CERN

The Grid

Miron Livny: Benefit to science: democratization of computing Still very manpower intensive: when the support team

goes on holiday, so does the Grid (CMS testbed in Dec)

Best practice middleware requires True collaboration, “open minds” (cf. Berman) Testing, deployment/adoption, evaluation metrics,

robustness, professional support, longevity, responsiveness to show stoppers, …

Much to do and improve but important progress E.g. VDT as standard middleware suite

Page 18: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 18

Torre Wenaus, BNL/CERN

Important developments

Community consensus on a C++ object store: ROOT I/O Though many approaches to its use Combined with RDBMS for physics data storage CDF, RHIC, LHC, BaBar, GLAST, …

Software engineering is catching up to us – F. Carminati “High ceremony processes” are not an obvious success And we are not alone… “Agile methodologies”, Extreme Programming (XP), is SE’s

response Extremely close to a successful HENP working model Adaptive, simple, incremental, tight iterations, plan for

change, adjust the methodology for your environment “I just learned we use XP”, comment from CDF

Means of responsibly formalizing and addressing – in a useful way – software engineering in HEP, and software management

Both must be effective and lightweight: Agile

Page 19: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 19

Torre Wenaus, BNL/CERN

Important developments

Major strides in networking HENP a leading applications driver and a co-developer of global

networks (H. Newman) Require rapid global access to event samples and analyzed

physics results drawn from massive data stores PB by 2002, ~100 PB by 2007, ~1 Exabyte by ~2012

Rate of Progress >> Moore’s Law Factor of ~1M in 1985-2005 (~5k during 1995-2005) in

global HENP network bandwidth Factor of 25-100 Gain in max sustained throughput in 15

months on some US+TransAtlantic routes Network providers see us as an opportunity because we

push real production applications Future promise: Optiputer (P.Papadopoulos)

“Key driving applications keep the IT research focused”

Page 20: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 20

Torre Wenaus, BNL/CERN

Important developments

The LHC Computing Grid Project Major new internationally supported effort to build the distributed

computing environment of the LHC Encompasses

the distributed computing facility Site fabrics (facilities), middleware selection, integration, testing,

deployment at distributed sites, operations and support, … the common physics applications software

Persistency, core libraries and services, physics analysis interfaces, simulation and other frameworks, all in a distributed environment

Must succeed if LHC computing is to succeed! An impressive effort by the experiments together with CERN to work in

accord across the cope of computing Managed so as to ensure comprehensive oversight by the experiments First testbed deployment is this summer (LCG-1)

Including the first major applications deployment, POOL persistency framework (ROOT I/O + MySQL hybrid)

Page 21: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 21

Torre Wenaus, BNL/CERN

Important developments

Success of mass stores Castor “reliable and effective” (ALICE) D0/CDF convergence on successful Enstore/SAM HPSS successful at RHIC

Exciting new generation of specialized lattice gauge computers (B. Sugar)

Two tracks: QCD on a chip: QCDOC, a “technical marvel”, project with

IBM $1M/Tflop, aiming at 10+ Tflop at BNL in 04

Optimized commodity clusters Pentium 4, Myrinet/Gbit Ethernet 10+ Tflop at FNAL and JLAB by 06

SCIDAC grant to improve software usability

Page 22: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 22

Torre Wenaus, BNL/CERN

Receding trends

Objectivity and ODBMS in general “Jury still out” at CHEP 2000 (P.Sphicas), but now clear Objectivity dropped or being phased out by LHC experiments,

COMPASS, BaBar event store In PHENIX “becoming a liability” (compiler issues);

augmented with RDBMSs Not due to technical failure but a mix of technical problems,

commercial concerns, manpower costs, availability of an alternative

Its replacements are not other ODBMSes but files (often ROOT) + RDBMS (mySQL, Oracle, Postgres…) for metadata

Magnetic tape (apart from archival) PASTA: “unlimited” multi-PB disk caches technically possible

but true cost is unclear (reliability, manageability) File system access under urgent investigation “tapes as random access device no longer a viable option” –

large disk caches needed for LHC analysis

Page 23: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 23

Torre Wenaus, BNL/CERN

Receding trends

Commercial software? No… Some in decline (Objy, LHC++), but new prospects

opening (IBM, Sun, MS, …) in Grid Open source now has an important commercial

element we derive great benefit from (even post-.com crash)

Red Hat, MySQL, Qt, …

Page 24: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 24

Torre Wenaus, BNL/CERN

Underrepresented

Collaborative tools Was represented this week, but only lightly Vital for distributed collaboration on software development and

physics analysis H. Newman: need culture of collaboration

Distributed and remote collaboration should be the norm Not solely, or even predominantly?, a matter of tool development

in the community How is the exponential commercial side evolving and how

can we leverage it What is the evolutionary path, strategy, role for community-

developed tools such as VRVS Why is the user experience often poor

Poor physical facilities/configurations, instabilities, heterogeneous tools/protocols, support issues, …

Current experience sometimes competes unsuccessfully with the telephone, despite all the shortcomings

Page 25: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 25

Torre Wenaus, BNL/CERN

Open questions

Distributed analysis What will it look like? What development line(s) are taking us

there? Still very much R&D pursued in multiple directions Several models (e.g. R.Brun) with varying degrees of Grid

exploitation/distributed character H.Newman – where is the comprehensive global system

architecture? M.Livny – have to proceed incrementally, step by step, from the bottom up

Some efforts were reported which are incrementally extending established analysis tools into Grid-based analysis

PROOF, JAS Others working from various starting points

Genius, Ganga, Clarens, …

Page 26: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 26

Torre Wenaus, BNL/CERN

Open questions

Distributed analysis continued Production: environments more well-defined, tools

more advanced, a few in production, varying levels of middleware usage

AliEn (ALICE), SAM-Grid (Run2), CMS tool suite, GRAT, Magda (ATLAS),DIRAC (LHCb) …

Not a lot of sharing/collaboration above the middleware level!!

Necessary precursor to the more complex analysis environment, and hard in itself

What analysis improvements will the Grid really provide? (panel discussion)

Page 27: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 27

Torre Wenaus, BNL/CERN

What analysis improvements will the Grid really provide? (panel discussion)

Some of the comments… (what I heard, not what was said) Murphy’s Law needs to be beaten, not Moore’s Law

(V.Innocente) From a technical point of view, the realization of a successful

grid will be a single integrated distributed computing center (R.Mount)

But beyond the technical, a successful grid will grow human resources, drawing in distributed people not otherwise involved, as well as material resources (M. Kasemann)

The grid is more than this. The LHC will build the first global collaboration, reaching out to uninvolved countries. This incurs on us an obligation. Through the grid we must make their participation possible and their resources useful. (H. Newman)

It is an unprecedented opportunity to screw up. But we have no choice, we cannot put it all in one place. Focus on reliability.

Page 28: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 28

Torre Wenaus, BNL/CERN

Grid panel 2

The grid is something new. We can’t let a ‘one virtual computing center’ be the dominating thing. There should be no dominant force and we should avoid centralized decision making. This will help analysis. (L. Robertson)

Grids enable collaboration at a scale not attempted before. Distributed efforts are motivated to compete with one another and with the central site, and this brings benefits and resources. Analysis groups are teams, spread across continents and time zones. How do they collaborate? The grid should provide the solution. Also, provenance is largely overlooked, but it is key to analysis. (P. Avery)

We have no model for how 5000 users will use a globally distributed facility. System issues must be addressed now. (H. Newman)

Physicists should not see the grid at all. It should be transparent. (P. Mato)

Page 29: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 29

Torre Wenaus, BNL/CERN

Grid panel 3

The grid will be successful if we make it simple. Will force some coherence in the development of distributed analysis tools. Too much process will kill the process. There is not enough prototyping going on. (R.Brun)

Agree, we need more prototyping. We need candidate strategies, then build prototypes, and see what works. You have to do this before you will be able to abstract from experience and automate & make transparent the approaches that work. (H. Newman)

Funding agencies, computer scientists, other sciences are excited by the HEP grid work, eg. on provenance. Possibility of infusion of funding. Could pursue google-like response to what now takes 3 months. (R. Mount)

The grid will enable collaborative work and harness distributed brainpower. It will allow HENP to be more present as a field at the home university. This is important for the health of our field. (H. Newman)

There is lots to learn from existing experiments. (R. Brun)

Page 30: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 30

Torre Wenaus, BNL/CERN

Open questions

Impact of facility security on Grid computing Site security in the grid era – Dane Skow

Avoid complexity in designing security; it is the bane of secure systems

Must be agile in the face of change; resistant to attack

Risk management, not elimination; must accept some risk to carry on work

No clear answers in the bottom line, there is much yet to be resolved and understood, and many are working

Workable resolution is vital, since you don’t have a usable grid if the walls don’t have sockets

Page 31: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 31

Torre Wenaus, BNL/CERN

Open questions

Impact of OGSA migration (Globus) on middleware Open Grid Services Architecture Leveraging industry standard web services Much industry involvement

IBM, Sun, NEC, Oracle, … Attention given to backward compatibility Promising approach; may the migration go well!

Alpha is under test; production release in June Major dependency given Globus’ foundation role

in our middleware Current Globus2 will be supported for some time but we

will be interested in new functionality

Page 32: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 32

Torre Wenaus, BNL/CERN

Open questions

Utility and practicality of generate-on-demand virtual data (‘virtual data by materialization’)

Networking going well; cost/complexity equation favors copying Interesting talk (C.Jones) on successful implementation and use

for many years in CLEO Relies on user discipline to ensure regenerated data is

trustworthy Utility of data provenance management, needed for secure trust

of on-demand data, is a separate question Should have important utility, not only for virtual data

(reproducibility, trust) but as a communication mechanism in widely distributed collaborations

Cannot allow reliance on hallway conversations with production gurus

Page 33: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 33

Torre Wenaus, BNL/CERN

Concerns

Data analysis as “the last wheel of the car” (R. Brun) Clear message from current generation (e.g. Run 2, BaBar): don’t

leave data analysis systems and infrastructure too late, it will lead to problems

Vastly more true when we are talking about doing globally distributed analysis, for the first time

with unprecedented volume and complexity, e.g. Terabyte scale at the LHC

Making dist analysis both very difficult and mandatory We cannot bootstrap ourselves into a global analysis system, it

will take long incremental work, so we better be working in a coordinated & effective way now

R. Brun: Will not converge on one system; will be multiple competing systems, and that will not be bad [hopefully a small number]

Page 34: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 34

Torre Wenaus, BNL/CERN

Concerns

Are we doing enough to ensure senior people can contribute directly to physics analysis?

How do we interpret the fact (R. Brun) that PAW usage is still rising?

Has everyone bought the C++/OO paradigm shift? Are we developing and/or providing the right tools?

Is there enough engagement of senior physicists in the (limited) exploratory work being done on future physics analysis environments?

Almost certainly no, and may be difficult to attract their attention unless/until attractive prototypes can be turned loose on them

Page 35: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 35

Torre Wenaus, BNL/CERN

Major Challenges

Storage architecture “possibly biggest challenge for LHC” (PASTA)

Seamless integration from CPU caches to deep archive

Currently very poor data management tools for storage systems

More architectural work needed in next 2 years

Page 36: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 36

Torre Wenaus, BNL/CERN

Future ALICE Data Challenges

New technologies CPUs Servers Network

MB

ytes

/s

R. Divia

Page 37: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 37

Torre Wenaus, BNL/CERN

Conclusions (1)

Coming experiments must learn from prior generations: give early (ie for LHC, immediate) attention to data analysis

It will take generations of incremental iterations of design, prototyping and stressful deployment to get it right

Particularly in the unprecedented global collaborative environment of the LHC

C++ is a mature and accepted standard Several generations of C++ code in production experiments

(BaBar, Run 2, …) Maturation of tools into broad usage (Geant4, ROOT I/O) No sign of a major new language migration so far [thank

goodness] But beware excessive complexity and remember the promise of

accessible, usable software

Page 38: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 38

Torre Wenaus, BNL/CERN

Conclusions (2)

Grids and networking are making great strides HENP is a successful and valued partner with CS

We provide a community focused on challenging large-scale deployments in real research settings

But Murphy’s Law is a potent adversary today; far from robust transparency, and much much more to do

Global collaborative computing must become a successful norm for us

Down to the global researcher at the home institute Rich leadership potential for our field

Important new common endeavours like the Grid and LCG have much invested in their success… will be interesting to measure the degree of success at next CHEP

Page 39: CHEP 2003 General Summary

CHEP 2003 Summary, March 28 2003 Slide 39

Torre Wenaus, BNL/CERN

Thanks

Thanks to Jim Branson and his team of organizers for giving us

A stimulating program and comfortable schedule

More-than-pleasant facilities and surroundings

Terrific banquet, I hear! A very successful conference.

I for one will return to La Jolla any time

Page 40: CHEP 2003 General Summary

I agree with all the other summaries.

Thank you to the organizers,and have a safe journey home