andrew c. smith, 11 th may 2007 egee user forum 2 - dirac data management system user forum 2 data...

17
Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and I remember. Involve me and I understand.” Chinese proverb

Upload: alexis-hulton

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

User Forum 2

Data Management System

“Tell me and I forget. Show me and I remember. Involve me and I understand.” Chinese proverb

Page 2: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Still to come…

LHCb Computing Model (simplified)Requirements for Data Management System

Introduction to DIRACDIRAC Data Management System (DMS)

Core DIRAC DM Components

Bulk Transfer Framework

Data Driven Automated Transfers

Reliable Data Management

Overview of EGEE resources used

Page 3: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Reconstruction JobStripping JobReconstructed RAW File (rDST)

Stripped File (DST)RAW Physics FileRAW Replication

LHCb Computing Model (Simplified)

DST Broadcast

Page 4: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

DM Requirements in Numbers

2GB RAW file every ~30s

Upload to Castor at 40MB/s

on 1GB dedicated link.

Each RAW file replicated from

Castor to 1 of LHCb’s 6

Tier1s using shared 10GB

links. Aggregated 40MB/s.

Each Stripped DST

produced is replicated to

all Tier1s using

dedicated network. Each

Tier1 (on average)

~11MB/s in AND out..

Page 5: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Introduction to DIRAC

DIRAC is LHCb’s Grid Workload and Data Management System

Initial incarnation as LHCb production system

Since evolved into a generic Community Grid Solution

Either stand-alone environment or

Community Overlay Grid System (COGS)

Architecture based on Services and AgentsImplementing Service Oriented Architecture

VO specific utilities can be tailored as required

Demonstrated 10k concurrently running jobs

Management of O(10M) Data files and replicas

See Stuart Paterson’s talk for more on WMS

Page 6: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

DIRAC Core Data Management System

FileCatalogueBFileCatalogueBFileCatalogueAFileCatalogueA

SE ServiceSE Service

SRMStorageSRMStorage RFIOStorageRFIOStorage StoragePlugInXStoragePlugInX

StorageElementStorageElement

ReplicaManagerReplicaManagerLCG File CatalogueLCG File Catalogue

User InterfaceUser Interface WMSWMS DM AgentsDM AgentsCore DMClients

Physical Storage

Core DM Components

The main components are:Replica Manager

File Catalogues

Storage Element and access plug-ins

Page 7: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

DM Core Components

Replica Manager provides logic for DM operationsInteraction with StorageElement

File upload/download/removal to/from Grid, File replication across SEs

Interaction with File Catalog API

File/replica registration/removal, Obtain replica information

Logging of operations returned to client

StorageElement is an abstraction of a Storage facility

Access provided by plug-in modules for access protocolsCurrent plug-ins: srm, gridftp, bbftp, sftp, http

File Catalogue APIAll file catalogues offer same interface

Can be used interchangeablyLCG File Catalog (LFC), ProcessingDB….

VO specific resources easily integrated

Page 8: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Other Key Components

Data Management requests stored in RequestDB

XML containing parameters for DM operation

Operation type, LFN, etc….

Requests obtained and placed through RequestDB Service

Transfer Agent polls RequestDB Service for work (multi-threaded)

Contacts Replica Manager to perform DM operation

Full log of operations returned

Retries based on logging info

Until success

Redundacy built-in

Transfer Agent

ReplicaManager

RequestDBSvc

RequestDatabase

ToDoToDoFailed

Page 9: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Data Mover

DIRAC @ Online Gateway

Transfer Agent

ReplicaManager

Online Storage

ADTDBLFC

LHCb ONLINE SYSTEM

RequestDBSvc

Online RunDatabase

RequestDatabase

CERN-IT

File movement

Request movement

rfcp

RFIOPlugin

RAW Upload to Castor

ToDoToDo

DoneDone

Done

FC API

Page 10: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Bulk Data Transfers

gLite File Transfer Service (FTS)

Provides point-to-point reliable bulk transfers

Channel architecture

SURLs at SRM X to SURLs at SRM Y

Utilizing high throughput dedicated networks

Network resources pledged to WLCG

CERN-Tier1s

Tier1-Tier1 matrix

DIRAC DM System Interfaced to FTS

Use FTS CLI to submit and monitor jobs

DIRAC DM System

Scheduling and placement of transfers

Preparing source and target SURLs

Transfer Agent

ReplicaManager

EGEE

FTS S

vc

SRM/G-U-C

Page 11: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

DIRAC DM components developed to perform data driven management

AutoDataTransferDB (AdtDB) contains pseudo file catalogue

Offers API to manipulate catalogue entries

Based on ‘transformations’ contained in the DBTransformations defined for each DM operation to be performed

Defines source and target SEs

File mask (based on LFN namespace)

Number of files to be transferred in each job

Can select files of given properties and locations

Replication Agent manipulates AdtDB API

Checks active files in AdtDB

Applies mask based on file type

Checks the location of file

Files which pass mask and match SourceSE selected for transformation

Once threshold number of files found FTS jobs created

ReplicationAgent logic generalised to support multiple transformation types

Data Driven Production Management

Page 12: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

RAW Replication

Data Mover

DIRAC @ Online Gateway

Transfer Agent

ReplicaManager

Online Storage

ADTDBLFC

LHCb ONLINE SYSTEM

RequestDBSvc

Online RunDatabase

RequestDatabase

CERN-IT

File movement

Request movement

rfcp

RFIOPlugin

ToDoToDo

DoneDone

Done

FC API

When file uploaded to Castor registered in AdtDBThis is the hook to data driven replication

Page 13: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Transfer Agent

ReplicaManager

RequestDBSvc

RequestDatabase

DIRAC DM System

File MovementRequest movement

AdtDBReplication

Agent

Tier1 SRM

WLCGFT

S S

vc

SRM/G-U-C

RAW Replication II

LFC

After replication registration

LFC and ProcessingDB

ProcessingDB drives data driven reconstruction and stripping jobs

Page 14: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Reliable Data Management

LHCb dedicated VO Box

provided at Tier1s

DIRAC instance installed

RequestDB service

TransferAgent

Provides failover mechanism

File upload from WN to associated SE

If fails alternative SE chosen, ‘move’ request put to VO box

Also provided initial mechanism for DST distribution

DST uploaded to associated Tier1SE

‘Replication’ requests put to VOBoxes

Proven capable of 100MB/s integrated across all Tier1s.

Page 15: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Site (Country) Tape Files Tape Used (TB) Disk Files Disk Used (TB)

CERN (CH) 1660232 191.54 316503 59.10

CNAF (IT) 132229 19.68 160577 29.73

GRIDKA (DE) 110001 14.59 170008 31.83

IN2P3 (FR) 31604 4.93 294214 47.36

PIC (ES) 142520 19.37 128722 23.89

RAL (UK) 274816 37.12 215108 40.74

SARA (NL) 40849 5.70 161145 30.33

Use of Resources

During LHCb’s DC06 DIRAC’s DM System

Stored 3.8M files at CERN + Tier1s

292TB of tape

262TB of disk

+registration in the LCG File Catalogue

Disk Usage (TB)

Total 262TB

59.10

29.73

31.8347.36

23.89

40.74

30.33 CERN (CH)

CNAF (IT)

GRIDKA (DE)

IN2P3 (FR)

PIC (ES)

RAL (UK)

SARA (NL)

Page 16: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

DIRAC core DMS extensible, reliable, redundantVO specific resources plug-able

5 years of experiencing managing LHCb data

Data driven operations to meet LHCb computing modelInitial upload of RAW physics files

Replication to Tier1s

Broadcast of DSTs

In the last year DIRAC DMS handled 3.8M files/replicas292TB of tape

262TB of disk

Summary

Page 17: Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and

Andrew C. Smith, 11th May 2007EGEE User Forum 2 - DIRAC Data Management System

Questions…?