distributed analysis system in the atlas experiment minsuk kim university of alberta 24 jun 2008...

37
Distributed Analysis Distributed Analysis System System in the ATLAS in the ATLAS Experiment Experiment Minsuk Kim Minsuk Kim University of Alberta University of Alberta 24 Jun 2008 24 Jun 2008 KISTI Seminar KISTI Seminar

Upload: victor-wiggins

Post on 31-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Distributed Analysis Distributed Analysis SystemSystem in the ATLAS in the ATLAS

ExperimentExperiment

Minsuk KimMinsuk KimUniversity of AlbertaUniversity of Alberta

24 Jun 200824 Jun 2008

KISTI SeminarKISTI Seminar

Page 2: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 22

OutlineOutline Large Hadron Collider (LHC)Large Hadron Collider (LHC)

• ATLAS Experiment and ComputingATLAS Experiment and Computing

Distributed Analysis Model in Distributed Analysis Model in ATLASATLAS• Grid InfrastructureGrid Infrastructure

Distributed Analysis ToolsDistributed Analysis Tools• Ganga and PathenaGanga and Pathena

ConclusionsConclusions

Page 3: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 33

LHC at CERNLHC at CERN

Page 4: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 44

Proton-Proton InteractionProton-Proton Interaction

Extracting interesting physicsExtracting interesting physicsfrom this massive data samplefrom this massive data sample

is a big challengeis a big challenge

Real-time data selection process reduces event rate to 100~200 events/s 109 events/yr

Page 5: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 55

ATLAS ExperimentATLAS Experiment37 Countries167 Institutes~2000 Collaborators(Canada ~4%)

(Pixels, SCT, TRT)

Page 6: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 66

ATLAS ComputingATLAS Computing 2x102x1099 events/yr and 1 event ~ 1.6 events/yr and 1 event ~ 1.6

MBMB ATLAS will record about 3.2 ATLAS will record about 3.2

Petabytes of data per year Petabytes of data per year (3.2 (3.2 million GB)million GB)

plus 2-3 times as much simulated plus 2-3 times as much simulated datadata

invites comparisons like invites comparisons like “if we “if we wrote one year’s data on DVDs it wrote one year’s data on DVDs it would make a stack roughly high would make a stack roughly high as the CN Tower (553 m)”as the CN Tower (553 m)”DVD thickness: 1.2 mmDVD capacity: 8.5 GB (1-side, 2-layer)3.2 PB/8.5 GB = 376470 discs = 452 m

Page 7: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 77

LHC Computing Grid (LCG)LHC Computing Grid (LCG)

UVicSFU

UofA

UofT

McGill

One massive computing One massive computing centre not possiblecentre not possible Farm out data around Farm out data around the world using the world using GRID GRID technologytechnology 12 Tier-1 for raw 12 Tier-1 for raw processingprocessing

1 in Canada: TRIUMF1 in Canada: TRIUMF >100 Tier-2 for analysis>100 Tier-2 for analysis

5 sites in Canada5 sites in Canada

- West: UVic, SFU, - West: UVic, SFU, AlbertaAlberta

- East: Toronto, - East: Toronto, McGillMcGill5 Gbit/s5 Gbit/s 1 Gbit/s1 Gbit/s

CERN CERN TRIUMF TRIUMF ALBERTA ALBERTA

Page 8: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Distributed Distributed Analysis Model in Analysis Model in

ATLASATLAS

Page 9: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 99

ATLAS Data Replication and ATLAS Data Replication and DistributionDistribution

EventFilter

Many Tier-3

CERNAnalysisFacility

Data Reprocessing

MC Production

User Analysis

1st pass calibration

Reconstruction 24h Data Export

Tier-1Tier-0

Tier-2

Page 10: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1010

ATLAS Event Data ModelATLAS Event Data ModelRefining the data by: Add higher level info, Skin, Thin, Slim

Page 11: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1111

Different kind of grid environ. based on 3 Different kind of grid environ. based on 3 gridsgrids• WLCG/EGEE (Enabling Grids for E-WLCG/EGEE (Enabling Grids for E-

sciencE)sciencE)• OSG (Open Science Grid)OSG (Open Science Grid)• NG (NorduGrid)NG (NorduGrid)

Grids have differences inGrids have differences in• Middle-wareMiddle-ware• Replica catalogs to store dataReplica catalogs to store data• Software tools to submit jobsSoftware tools to submit jobs

ATLAS Grid InfrastructureATLAS Grid Infrastructure

However, hide differences from the ATLAS user

Page 12: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1212

Distributed Analysis Model Distributed Analysis Model The distributed analysis model is The distributed analysis model is

based on the ATLAS computing modelbased on the ATLAS computing model• Data is distributed in Tier-1/Tier-2 facilities by Data is distributed in Tier-1/Tier-2 facilities by

defaultdefault available 24/7available 24/7

• User jobs are sent to the dataUser jobs are sent to the data large input datasets (100 GB up to several TB)large input datasets (100 GB up to several TB)

• Results must be made available to the userResults must be made available to the user potentially already during processingpotentially already during processing

• Data is added with meta-data and Data is added with meta-data and bookkeeping in catalogsbookkeeping in catalogs

Page 13: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1313

Distributed Analysis Model Distributed Analysis Model

Need for: Need for: Distributed Data Management Distributed Data Management (DDM)(DDM)• Managed by DDM system DQ2 (Don-Quijote 2)Managed by DDM system DQ2 (Don-Quijote 2)• System based on datasets which are System based on datasets which are

collections of filescollections of files a file exists in the context of datasetsa file exists in the context of datasets

• Automated file management, distribution and Automated file management, distribution and archiving throughout the whole grid using a archiving throughout the whole grid using a Central Catalog, FTS, LFCsCentral Catalog, FTS, LFCs

• Random access needs a pre-filtering of data Random access needs a pre-filtering of data of interestof interest

e.g. trigger or ID streams or TAGs (event-level meta e.g. trigger or ID streams or TAGs (event-level meta data)data)

Page 14: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1414

DQ2: Data management system for all distributed ATLAS DQ2: Data management system for all distributed ATLAS datadata• Supports all three ATLAS Grid flavorsSupports all three ATLAS Grid flavors• Manages all data flows (EFManages all data flows (EFTier0Tier0GridTiersGridTiersInstitutesInstitutesusers)users)• Moves data between grid sites, query and retrieval of dataMoves data between grid sites, query and retrieval of data• Data is grouped into datasets, based on meta-data, like run Data is grouped into datasets, based on meta-data, like run

periodperiod dataset name: dataset name:

Project.NNNN.PhRef.ProductionStep.Format.VersionProject.NNNN.PhRef.ProductionStep.Format.Version User-defined should have prefix user.FirstnameLastnameUser-defined should have prefix user.FirstnameLastname

DQ2 end-user tools DQ2 end-user tools (the DQ2 dataset browser)• dq2_ls to list datasets matching a given patterndq2_ls to list datasets matching a given pattern• dq2_get to copy data from local storage or over the griddq2_get to copy data from local storage or over the grid• dq2_put to create user-defined datasetsdq2_put to create user-defined datasets

dq2 can see only Tier1/Tier2 SEs (castor, dCache, DPM), so files dq2 can see only Tier1/Tier2 SEs (castor, dCache, DPM), so files need to be copied to SE first and then registered to DQ2 systemneed to be copied to SE first and then registered to DQ2 system

Distributed Data Distributed Data ManagementManagement

Page 15: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1515

Job Flow in the WLCG/EGEE Job Flow in the WLCG/EGEE GridGrid

Job goes to the data

Job goes to the data

Page 16: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1616

Grid Job SubmissionGrid Job Submission Naive assumption: Grid ≈ large batch Naive assumption: Grid ≈ large batch

systemsystem

• Provide complicated job configuration jdlProvide complicated job configuration jdl filefile• Find suitable Athena software, installed as Find suitable Athena software, installed as

distribution kits in the Grid distribution kits in the Grid • Locate the data on different storage elementsLocate the data on different storage elements• Job splitting, monitoring and book-keepingJob splitting, monitoring and book-keeping• etc.etc.

Need for automation and integration of various different components

Page 17: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1717

ATLAS offers several ways to do distributed analysisATLAS offers several ways to do distributed analysis• Data from MC Production System is currently consolidated by Data from MC Production System is currently consolidated by

DDM-operations team on all Tier1 and then all Tier2 sitesDDM-operations team on all Tier1 and then all Tier2 sites• Analysis model foresees Athena analysis of AODs/ESDs and Analysis model foresees Athena analysis of AODs/ESDs and

interactive use of Athena-aware-ROOT tuplesinteractive use of Athena-aware-ROOT tuples CESE/RLS

User with valid grid certificate Simplifying use of the Grid: easy-to-use frontends for job definition and management, implemented in Python

Distributed Analysis – Distributed Analysis – CurrentCurrent

DQ2

Page 18: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 1818

Distributed AnalysisDistributed Analysis

How to combine all these: Job scheduler/manager GANGA

Page 19: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Distributed Distributed Analysis with Analysis with

Ganga & PathenaGanga & Pathena

Page 20: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2020

A user-friendly job definition and management toolA user-friendly job definition and management tool• Allows simple switching between testing and large-scale data processingAllows simple switching between testing and large-scale data processing

• Readily extended/customized to meet the needs of different usersReadily extended/customized to meet the needs of different users A job is constructed from a set of building blocksA job is constructed from a set of building blocks

• Specify which software to be run (application)Specify which software to be run (application)• Specify the processing system (backend), input/output and so onSpecify the processing system (backend), input/output and so on

What is GangaWhat is Ganga(an ATLAS/LHCb joint project)(an ATLAS/LHCb joint project)

Pluggable framework!

Mandatory

optional

Page 21: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2121

Ease user’s experience in switching between different tech.Ease user’s experience in switching between different tech. Concentrate developer’s effort in specific domainConcentrate developer’s effort in specific domain

Plug-in based designPlug-in based design

Common interface

Specific implementation

Page 22: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2222

• Each job has a jobid which specifies the location of the repository and workspace in gangadir• By default job to complete dataset locations, but often dataset is only incomplete (0 up to all files) make sure a dataset is present at a site by using this option, j.inputdata.min_num_files=N (N>0)

often advisable to force a job to a particular site or a subset of sitesoften advisable to force a job to a particular site or a subset of sites• Providing two user interface clients and scripting mode (PANDA-style job submission)

ganga athena --inDS [input dataset] --outputdata [output] --lcg --ce [nodes] ganga athena --inDS [input dataset] --outputdata [output] --lcg --ce [nodes] testJetReco.pytestJetReco.py

•j = Job()•j.name=test•j.application=Athena()•j.application.atlas_release=‘13.0.30’ #j.application.atlas_production=‘13.0.30.3’•j.application.option_file=‘testJetReco.py’•j.application.max_events=1000•j.inputdata=DQ2Dataset() #or ATLASLocalDataset()•j.inputdata.dataset=‘misal1_csc11.005012.J3_pythis_jetjet.digit.RDO.v12003103_tid016367’•j.outputdata=ATLASOutputDataset() #or DQ2OutputDataset()•j.outputdata.outputdata=[‘AOD.pool.root’]•j.inputsandbox=[‘PDGTABLE.MeV’] #also j.outputsandbox•j.backend=LCG() #or NG(), LSF(), Local()•j.backend.requirements.sites=[‘TRIUMF’,’ALBERTA’] #since 4.4.x•j.submit()

RDOAOD

Job

defi

nit

ion

wit

hin

Ip

yth

on

sh

ell Ganga How-toGanga How-to

Page 23: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2323

Usage of Ganga at remote Usage of Ganga at remote sitessites

Used

at

~7

0 s

ites f

or

4 m

on

ths o

f 2

00

8O

ver

12

50

un

iqu

e u

sers

sin

ce 2

00

7

Canada ~1%

• Ganga Monitoring under http://www.cern.ch/ganga Usage Statistics, and Usage Report on the Grid (EGEE) with GangaRobot Jobs monitored by Dashboard (user, site, ce, application, and so on)

Page 24: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2424

Main UsersMain Users

Other activitiesOther activities

Ganga ActivitiesGanga Activities

HARP

GarfieGarfie

ldld

Page 25: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2525

PathenaPathena(a python script for access to OSG resources via the Panda system)(a python script for access to OSG resources via the Panda system)

A user A user job composed of sub-jobsjob composed of sub-jobs• One One buildJobbuildJob to receive source files from the user, to to receive source files from the user, to

compile and produce compile and produce librarieslibraries which are stored to the storage which are stored to the storage• Many Many runAthenarunAthena’s to receive the libraries’s to receive the libraries and runs Athenaand runs Athena

completion of buildJob triggers runAthenacompletion of buildJob triggers runAthena output files are added to output files are added to an output datasetan output dataset DDM moves the dataset to areaDDM moves the dataset to area

A unique A unique PandaIDPandaIDJob splitter

extracting run configurationextracting run configuration

ConfigExtractor > Input=POOLConfigExtractor > Input=POOL

ConfigExtractor > Output=AANT ConfigExtractor > Output=AANT AANTupleStream AANTAANTupleStream AANT

archive sourcesarchive sources

archive InstallAreaarchive InstallArea

post sources/jobOpost sources/jobO

query files in query files in dataset:fdr08_run1.0003067.StreamJet.merge.AOdataset:fdr08_run1.0003067.StreamJet.merge.AOD.o1_r8_t1D.o1_r8_t1

submitsubmit

====================================== JobID : 39 Status : 0 > build PandaID=8558262 > run PandaID=8558263

A unique A unique PandaIDPandaID

Page 26: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2626

pathena AnalysisSkeleton_jetTrigger.pypathena AnalysisSkeleton_jetTrigger.py \ \ #not special job-option file (can run locally)#not special job-option file (can run locally) --outDS user.MinsukKim.fdr08_run1 \ --inDS fdr08_run1.0003067.StreamJet.merge.AOD.o1_r8_t1 \ --site ALBERTA --inputFileList fdr08_run1.list \ --libDS LAST

outputDSoutputDS output (name convention)output (name convention)

inDSinDS name of input datasetname of input dataset

sitesite job to OSG by default, possible to LCG (if AUTO, job to site w/ job to OSG by default, possible to LCG (if AUTO, job to site w/ most data)most data)

future: a job submission to best site based on data/CPUs future: a job submission to best site based on data/CPUs availabilityavailability

splitsplit number of sub-jobs to which an analysis job is split (i.e. how many number of sub-jobs to which an analysis job is split (i.e. how many CPUs)CPUs)

nFilesnFiles use an limited number of files in the input dataset (if not, all w/ use an limited number of files in the input dataset (if not, all w/ auto split)auto split)

inputFileLisinputFileListt

filename which contains a list of files to be run in the input filename which contains a list of files to be run in the input datasetdataset

libDSlibDS library dataset (e.g. “LAST” means what the last build used)library dataset (e.g. “LAST” means what the last build used)

commandcommand one-liner (e.g. “EvtMax=3”)one-liner (e.g. “EvtMax=3”)

noSubmitnoSubmit don’t submit a job (error checking for script/dataset/site)don’t submit a job (error checking for script/dataset/site)

FD

R-1

@Tie

r-2

Path

en

a O

pti

on

s (

just

exam

ple

)

Pathena How-toPathena How-to

Page 27: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2727

Job status/site

Inputs/outputs

Log/debug

Panda MonitorPanda Monitor

ALBERTAALBERTA

En

ter

you

r P

an

daID

s

Page 28: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2828

PathenaPathena GangaGangaDesignDesign Specialized toolSpecialized tool Extensible generic toolExtensible generic tool

SetupSetup Checkout PandaToolsCheckout PandaTools(set PATHENA_GRID_SETUP_SH)(set PATHENA_GRID_SETUP_SH)

Ganga must be installedGanga must be installed

(no need to setup grid (no need to setup grid environ.)environ.)

Submit jobsSubmit jobs

How and whereHow and whereShell command lineShell command lineGrids Grids (OSG,...CERN,TRIUMF,ALBERTA)(OSG,...CERN,TRIUMF,ALBERTA)

CLIP, Script and GUICLIP, Script and GUI

Grids, LSF, PBS,… and Grids, LSF, PBS,… and LocalLocal

Site (dataset) findingSite (dataset) finding AUTOAUTO AUTOAUTO

Input datasetsInput datasets DQ2DQ2 DQ2 or DQ2 or Local/castor/dCacheLocal/castor/dCache

Get resultsGet results DQ2DQ2 DQ2 or DQ2 or Local/castor/dCacheLocal/castor/dCache

Bookkeeping/RetryBookkeeping/Retry With With pathena_utilpathena_util With With gangaganga

Monitoring/ErrorLogMonitoring/ErrorLog Web and local utilityWeb and local utility Web and local utilityWeb and local utility

LXPLUS/THOR to submit jobs to CERN/TRIUMF/ALBERTA, and to retrieve outputsLXPLUS/THOR to submit jobs to CERN/TRIUMF/ALBERTA, and to retrieve outputs

Pathena vs. GangaPathena vs. Ganga

Page 29: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 2929

The generic grid job submission frameworks can The generic grid job submission frameworks can be used with DDM/DQ2 to perform Distributed be used with DDM/DQ2 to perform Distributed AnalysisAnalysis• Use both Pathena and Ganga at LXPLUS and THOR Use both Pathena and Ganga at LXPLUS and THOR

clustersclusters• Submit jobs to CERN, TRIUMF, ALBERTA & OSG/NG sitesSubmit jobs to CERN, TRIUMF, ALBERTA & OSG/NG sites• Ganga available with Local, Batch, and Grid systemsGanga available with Local, Batch, and Grid systems

Distributed analysis in ATLAS is evolving rapidlyDistributed analysis in ATLAS is evolving rapidly• Many key components like the DDM system have come Many key components like the DDM system have come

online (Data Management is a central issue)online (Data Management is a central issue)• Multi-pronged approach to distributed analysis have Multi-pronged approach to distributed analysis have

encouraged one submission system to learn from encouraged one submission system to learn from another and ultimately produced a more robust and another and ultimately produced a more robust and feature-rich distributed analysis systemfeature-rich distributed analysis system

Conclusion Conclusion

Page 30: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 3030

Conclusion Conclusion

Configure once, run Configure once, run anywhereanywhere

GangaGanga

LocalLocal

BatchBatch

GridGridUserUser

PathenaPathena

Page 31: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 3131

Tier-2@ALBERTATier-2@ALBERTA THOR Linux Computing ClusterTHOR Linux Computing Cluster

• Began around 1998, 42 dual Pentium II/III machines (100Mb/s Ethernet)Began around 1998, 42 dual Pentium II/III machines (100Mb/s Ethernet)• Beowulf-type, Cheaper than S-computers by more than a factor of tenBeowulf-type, Cheaper than S-computers by more than a factor of ten

Current Hardware/Software Configuration Current Hardware/Software Configuration (high-speed Gigabit link)(high-speed Gigabit link)• 3 head-nodes for Cluster/Interactive User, 1 Torque/Maui server node3 head-nodes for Cluster/Interactive User, 1 Torque/Maui server node• Many server nodes for Grid Compute ElementMany server nodes for Grid Compute Element• 74 dual processor compute nodes, 250 work nodes (200 AMD Opteron)74 dual processor compute nodes, 250 work nodes (200 AMD Opteron)• 4 data storage nodes (~6 TB of RAID disk storage)4 data storage nodes (~6 TB of RAID disk storage)• 4 iSCSI storage arrays (~22TB) and 2 mass storage tape systems4 iSCSI storage arrays (~22TB) and 2 mass storage tape systems• Scientific Linux 3, 4 (and Fedora Core 2, 3), Various applications & toolsScientific Linux 3, 4 (and Fedora Core 2, 3), Various applications & tools

Multi-purpose Computing FacilityMulti-purpose Computing Facility• Prototype of PC-based Event Filter sub-farm (high-level trigger system)Prototype of PC-based Event Filter sub-farm (high-level trigger system)• Multiple-serial Monte Carlo production (Tier-2)Multiple-serial Monte Carlo production (Tier-2)• Fully integrated with LCG and Grid Canada projects for distributed Fully integrated with LCG and Grid Canada projects for distributed

computingcomputing

Page 32: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

BackupBackup

Page 33: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 3333

Middle-ware FeaturesMiddle-ware Features WLCG/gLite UI (EGEE)WLCG/gLite UI (EGEE)

• Job submission via LCG RB (Resource Broker)Job submission via LCG RB (Resource Broker)• Fast bulk submission with new gLite RBFast bulk submission with new gLite RB• LFC (Local File Catalog)LFC (Local File Catalog)

OSG/PanDAOSG/PanDA• An integrated An integrated PProduction roduction ANANd d DDistributed istributed AAnalysis nalysis

systemsystem• JobScheduler & Pilots : Acquisition of Grid CE resourcesJobScheduler & Pilots : Acquisition of Grid CE resources• LRC (Local Replica Catalog)LRC (Local Replica Catalog)

NordugridNordugrid• ARC middle-ware for job submissionARC middle-ware for job submission• RLS (Replica Location Server) file catalogRLS (Replica Location Server) file catalog• Now possible for distributed analysis (already in Now possible for distributed analysis (already in

production)production)

Page 34: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 3434

Ganga ArchitectureGanga Architecture

Page 35: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 3535

Ganga Applications and Ganga Applications and BackendsBackends

Page 36: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 3636

Three components:Three components: Central Dataset Catalog, Local Site Subscription Services, Client ToolsCentral Dataset Catalog, Local Site Subscription Services, Client Tools

DQ2 ArchitectureDQ2 Architecture

Page 37: Distributed Analysis System in the ATLAS Experiment Minsuk Kim University of Alberta 24 Jun 2008 KISTI Seminar

Minsuk Kim (Univ. of Alberta)Minsuk Kim (Univ. of Alberta) 3737

What is a Grid?What is a Grid? The key criteria:The key criteria:

• Coordinated distributed resources …Coordinated distributed resources …• Uses standard, open, general-purpose Uses standard, open, general-purpose

protocols protocols and interfaces …and interfaces …

• Deliver non-trivial qualities of serviceDeliver non-trivial qualities of service

What is not a Grid?What is not a Grid?• A cluster, a network attached storage device, a A cluster, a network attached storage device, a

scientific instrument, a network, etc.scientific instrument, a network, etc.• Each is an important component of a Grid, but Each is an important component of a Grid, but

by itself does not constitute a Gridby itself does not constitute a Grid