- iain bertram r-gma and dØ iain bertram ral 13 may 2004 thanks to jeff templon at nikhef

8
- Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

Upload: abner-hopkins

Post on 01-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: - Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

- Iain Bertram

R-GMA and DØ

Iain BertramRAL 13 May 2004

Thanks to Jeff Templon at Nikhef

Page 2: - Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

- Iain Bertram

Background

• DØ uses SAM as its Datagrid – (http://projects.fnal.gov/samgrid/)

• All official MC production carried out off-site– I.e. not at FNAL– Store in SAM

• Carried out significant fraction of data reprocessing off-site– Access and store data in SAM

Page 3: - Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

- Iain Bertram

DØ and EDG/LCG

• Nikhef group have implemented submission of DØ jobs on LCG– MC production– Data reconstruction– Notes from Jeff Templon.

• caveat: Jeff is the expert. I am not! Therefore I may have trouble answering questions (my technical experts are at the 4 corners of the globe…).

Page 4: - Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

- Iain Bertram

Monitoring using RGMA

• From within python script:– worker_node = socket.getfqdn()site = worker_node[string.find(worker_node,'.')+1:]jstabl.set_val('site',site)jstabl.set_val('start_time',start_time)cmdline = string.join(sys.argv)jstabl.set_val('command',cmdline)jstabl.insert()

• Under the hood: R-GMA (EDG product)• Can easily replace as long as don’t require

more than “set_val” and “insert” … R-GMA has SQL like structure

Page 5: - Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

- Iain Bertram

J. Templon Comments

• It was useful not to worry about details of where servers, youCommands such as– "DEFINE TABLE" and "INSERT" or "LATEST SELECT".– R-GMA looked like a giant distributed database.

• The SQL model worked well for what we wanted to do.• The down side is that the archiver process is not ready

for prime time.  – It never stays up for more than a few days at a time, and it

often dies in a way that fools the babysitting script into thinking that it is still alive.

– This of course is deadly.• (the thing that sucks in the published records from jobs, and

puts them in a database)

Page 6: - Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

- Iain Bertram

LCG/EDG Problems

• Single Storage Machine => bottleneck– “WP5” SEs– Traffic Jams

• R-GMA not really stable until end December– Couldn’t submit jobs– Missed monitoring records

• Software distribution reliable but extremely inefficient

• Poor submission command throughput

Page 7: - Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

- Iain Bertram

Plans

• All MC and data production will be running on SAM computational grid by summer– MC by June 1– Data reprocessing scheduled for later this year.– FNAL DØ farm will move to SAM-grid.

• Plan to support interfaces to LCG for this processing– Runjob will interface directly to LCG

Page 8: - Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef

- Iain Bertram

Needs

• Database Proxy Servers– Need to access trigger/calibration issues– Oracle database

• The DB proxy design is in principle generic being based on CORBA (Common Request Broker Architecture) which wraps the sql queries. A two-stage cache is used: RAM and disk space of which the size is configurable, e.g. the cache sizes we currently have configured are in the order of a couple GBs.

• Interface between SE and SAM?– Can store our files directly to SAM from LCG site