andrew c. smith, 11 th may 2007 egee user forum 2 - dirac data management system user forum 2 data...
TRANSCRIPT
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
User Forum 2
Data Management System
“Tell me and I forget. Show me and I remember. Involve me and I understand.” Chinese proverb
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Still to come…
LHCb Computing Model (simplified)Requirements for Data Management System
Introduction to DIRACDIRAC Data Management System (DMS)
Core DIRAC DM Components
Bulk Transfer Framework
Data Driven Automated Transfers
Reliable Data Management
Overview of EGEE resources used
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Reconstruction JobStripping JobReconstructed RAW File (rDST)
Stripped File (DST)RAW Physics FileRAW Replication
LHCb Computing Model (Simplified)
DST Broadcast
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
DM Requirements in Numbers
2GB RAW file every ~30s
Upload to Castor at 40MB/s
on 1GB dedicated link.
Each RAW file replicated from
Castor to 1 of LHCb’s 6
Tier1s using shared 10GB
links. Aggregated 40MB/s.
Each Stripped DST
produced is replicated to
all Tier1s using
dedicated network. Each
Tier1 (on average)
~11MB/s in AND out..
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Introduction to DIRAC
DIRAC is LHCb’s Grid Workload and Data Management System
Initial incarnation as LHCb production system
Since evolved into a generic Community Grid Solution
Either stand-alone environment or
Community Overlay Grid System (COGS)
Architecture based on Services and AgentsImplementing Service Oriented Architecture
VO specific utilities can be tailored as required
Demonstrated 10k concurrently running jobs
Management of O(10M) Data files and replicas
See Stuart Paterson’s talk for more on WMS
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
DIRAC Core Data Management System
FileCatalogueBFileCatalogueBFileCatalogueAFileCatalogueA
SE ServiceSE Service
SRMStorageSRMStorage RFIOStorageRFIOStorage StoragePlugInXStoragePlugInX
StorageElementStorageElement
ReplicaManagerReplicaManagerLCG File CatalogueLCG File Catalogue
User InterfaceUser Interface WMSWMS DM AgentsDM AgentsCore DMClients
Physical Storage
Core DM Components
The main components are:Replica Manager
File Catalogues
Storage Element and access plug-ins
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
DM Core Components
Replica Manager provides logic for DM operationsInteraction with StorageElement
File upload/download/removal to/from Grid, File replication across SEs
Interaction with File Catalog API
File/replica registration/removal, Obtain replica information
Logging of operations returned to client
StorageElement is an abstraction of a Storage facility
Access provided by plug-in modules for access protocolsCurrent plug-ins: srm, gridftp, bbftp, sftp, http
File Catalogue APIAll file catalogues offer same interface
Can be used interchangeablyLCG File Catalog (LFC), ProcessingDB….
VO specific resources easily integrated
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Other Key Components
Data Management requests stored in RequestDB
XML containing parameters for DM operation
Operation type, LFN, etc….
Requests obtained and placed through RequestDB Service
Transfer Agent polls RequestDB Service for work (multi-threaded)
Contacts Replica Manager to perform DM operation
Full log of operations returned
Retries based on logging info
Until success
Redundacy built-in
Transfer Agent
ReplicaManager
RequestDBSvc
RequestDatabase
ToDoToDoFailed
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Data Mover
DIRAC @ Online Gateway
Transfer Agent
ReplicaManager
Online Storage
ADTDBLFC
LHCb ONLINE SYSTEM
RequestDBSvc
Online RunDatabase
RequestDatabase
CERN-IT
File movement
Request movement
rfcp
RFIOPlugin
RAW Upload to Castor
ToDoToDo
DoneDone
Done
FC API
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Bulk Data Transfers
gLite File Transfer Service (FTS)
Provides point-to-point reliable bulk transfers
Channel architecture
SURLs at SRM X to SURLs at SRM Y
Utilizing high throughput dedicated networks
Network resources pledged to WLCG
CERN-Tier1s
Tier1-Tier1 matrix
DIRAC DM System Interfaced to FTS
Use FTS CLI to submit and monitor jobs
DIRAC DM System
Scheduling and placement of transfers
Preparing source and target SURLs
Transfer Agent
ReplicaManager
EGEE
FTS S
vc
SRM/G-U-C
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
DIRAC DM components developed to perform data driven management
AutoDataTransferDB (AdtDB) contains pseudo file catalogue
Offers API to manipulate catalogue entries
Based on ‘transformations’ contained in the DBTransformations defined for each DM operation to be performed
Defines source and target SEs
File mask (based on LFN namespace)
Number of files to be transferred in each job
Can select files of given properties and locations
Replication Agent manipulates AdtDB API
Checks active files in AdtDB
Applies mask based on file type
Checks the location of file
Files which pass mask and match SourceSE selected for transformation
Once threshold number of files found FTS jobs created
ReplicationAgent logic generalised to support multiple transformation types
Data Driven Production Management
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
RAW Replication
Data Mover
DIRAC @ Online Gateway
Transfer Agent
ReplicaManager
Online Storage
ADTDBLFC
LHCb ONLINE SYSTEM
RequestDBSvc
Online RunDatabase
RequestDatabase
CERN-IT
File movement
Request movement
rfcp
RFIOPlugin
ToDoToDo
DoneDone
Done
FC API
When file uploaded to Castor registered in AdtDBThis is the hook to data driven replication
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Transfer Agent
ReplicaManager
RequestDBSvc
RequestDatabase
DIRAC DM System
File MovementRequest movement
AdtDBReplication
Agent
Tier1 SRM
WLCGFT
S S
vc
SRM/G-U-C
RAW Replication II
LFC
After replication registration
LFC and ProcessingDB
ProcessingDB drives data driven reconstruction and stripping jobs
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Reliable Data Management
LHCb dedicated VO Box
provided at Tier1s
DIRAC instance installed
RequestDB service
TransferAgent
Provides failover mechanism
File upload from WN to associated SE
If fails alternative SE chosen, ‘move’ request put to VO box
Also provided initial mechanism for DST distribution
DST uploaded to associated Tier1SE
‘Replication’ requests put to VOBoxes
Proven capable of 100MB/s integrated across all Tier1s.
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Site (Country) Tape Files Tape Used (TB) Disk Files Disk Used (TB)
CERN (CH) 1660232 191.54 316503 59.10
CNAF (IT) 132229 19.68 160577 29.73
GRIDKA (DE) 110001 14.59 170008 31.83
IN2P3 (FR) 31604 4.93 294214 47.36
PIC (ES) 142520 19.37 128722 23.89
RAL (UK) 274816 37.12 215108 40.74
SARA (NL) 40849 5.70 161145 30.33
Use of Resources
During LHCb’s DC06 DIRAC’s DM System
Stored 3.8M files at CERN + Tier1s
292TB of tape
262TB of disk
+registration in the LCG File Catalogue
Disk Usage (TB)
Total 262TB
59.10
29.73
31.8347.36
23.89
40.74
30.33 CERN (CH)
CNAF (IT)
GRIDKA (DE)
IN2P3 (FR)
PIC (ES)
RAL (UK)
SARA (NL)
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
DIRAC core DMS extensible, reliable, redundantVO specific resources plug-able
5 years of experiencing managing LHCb data
Data driven operations to meet LHCb computing modelInitial upload of RAW physics files
Replication to Tier1s
Broadcast of DSTs
In the last year DIRAC DMS handled 3.8M files/replicas292TB of tape
262TB of disk
Summary
Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System
Questions…?