3d project status dirk duellmann, cern it for the lcg 3d project meeting with lhcc referees, march...

22
3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project http://lcg3d.cern.ch Meeting with LHCC Referees, March 21st 06

Upload: madison-green

Post on 17-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Meeting with LHCC RefereesDirk Duellmann3 3D Participants and Responsibilities LCG 3D is a joint project between –Service users: experiments and grid s/w projects –Service providers: LCG tier sites including CERN Project itself has (as all projects) limited resources (2 FTE) –Mainly coordinating requirement discussions, testbed and production configuration, setup and support –Rely on experiments/projects to define and validate their application function and requirements –Rely on sites for local implementation and deployment of testbed and production setup

TRANSCRIPT

Page 1: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

3D Project Status

Dirk Duellmann, CERN IT

For the LCG 3D project http://lcg3d.cern.ch

Meeting with LHCC Referees, March 21st 06

Page 2: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 2

Why a LCG Database Deployment Project?

• LCG today provides an infrastructure for distributed access to file based data and file replication

• Physics applications (and grid services) require a similar services for data stored in relational databases– Several applications and services already use RDBMS– Several sites have already experience in providing RDBMS services

• Goals for common project as part of LCG – increase the availability and scalability of LCG and experiment

components– allow applications to access data in a consistent, location

independent way– allow to connect existing db services via data replication

mechanisms – simplify a shared deployment and administration of this

infrastructure during 24 x 7 operation

• Scope set by PEB– Online - Offline - Tier sites

Page 3: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 3

3D Participants and Responsibilities

• LCG 3D is a joint project between – Service users: experiments and grid s/w projects– Service providers: LCG tier sites including CERN

• Project itself has (as all projects) limited resources (2 FTE) – Mainly coordinating requirement discussions, testbed

and production configuration, setup and support– Rely on experiments/projects to define and validate

their application function and requirements – Rely on sites for local implementation and deployment

of testbed and production setup

Page 4: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees

Dirk Duellmann 5

Online-Offline Connection

A well-documented schema was reported at the last LCG3D Workshop

Artwork by Richard Hawkings

Slide : A. Vaniachine

Page 5: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 6

Page 6: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 7

Page 7: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees

Dirk Duellmann 8

Offline FroNTier Resources/Deployment

• Tier-0: 2-3 Redundant FroNTier servers.• Tier-1: 2-3 Redundant Squid servers.• Tier-N: 1-2 Squid Servers.• Typical Squid server requirements:

– CPU/MEM/DISK/NIC=1GHz/1 GB/100GB/Gbit

– Network: visible to Worker LAN (private network) and WAN (internet)

– Firewall: Two Ports open for URI (FroNTier Launchpad) access and SNMP monitoring (typically 8000 and 3401 respectively)

• Squid non-requirements– Special hardware (although high-throughput

Disk I/O is good)– Cache backup (if disk dies or is corrupted,

start from scratch and reload automatically)• Squid is easy to install and requires little

on-going administration.

Squid(s)Tomcat(s)

Squid Squid Squid

DB

Squid Squid Squid

Tier 0

Tier 1

Tier N

FroNTierLaunchpad

http

JDBC

Slide : Lee Lueking

Page 8: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 9

Experiment ArchitecturesCommonalities & Differences

• Focus on common part of the experiment plans– Oracle database setup

• online (all), offline(all), tier 1(ATLAS/LHCb)– Mysql is also mentioned for T0 and T1: as addition, not as

replacement – Distribution online-offline

• ATLAS, CMS, LHCb: Oracle streams• Alice: collect / transfer data via Alice software

– Distribution tier 0 - tier 1: • ATLAS, LHCb: Oracle streams• CMS: Frontier (ATLAS interested)• Alice: File transfer and conditions lookup via file catalog

– Tier 1 - Tier 2• ATLAS: based on MySQL/SQLight files • Alice/CMS/LHCb: no tier 2 database service (apart from grid services)

Page 9: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 10

LCG 3D Service Architecture

T2 - local db cache-subset data-only local service

MO

O

O

M

T1- db back bone- all data replicated- reliable service

T0- autonomous reliable service

Oracle Streamshttp cache (SQUID)Cross DB copy &MySQL/SQLight Files

O

Online DB-autonomous reliable service

F

S S

S S

R/O Access at Tier 1/2(at least initially)

Page 10: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 11

LCG 3D Replication Testbed

• Since last summer - databases at ASCC, CERN, CNAF, GridKA, (FNAL), IN2P3, RAL

• Many replication test in progress – Offline->T1:

• COOL/Streams ATLAS : Stefan Stonjek (CERN, RAL, Oxford?)• COOL/Streams LHCb : Marco Clemencic (CERN, RAL, GridKA?)• POOL/FroNtier CMS : Lee Lueking (CERN and several t1/t2 sites) • AMGA/Streams: Birger Koblitz (CERN->CERN) • AMI/Streams : Solveig Albrandt (IN2P3->CERN)

• LFC/Streams: workplan proposed, starting with IN2P3• VOMS/Streams: workplan proposed, starting with CNAF

– Online->offline: • Conditions/Streams CMS : Saima Iqbal (functional testing)• COOL/Streams ATLAS : (Gancho Dimitrov) Server setup, networking config with pit network • LHCb : planning with LHCb online

• Coordination between sites/experiments during weekly 3D meetings– Status: successful functional test - ramping up volume/load– Need experiment involvement to define target scale

Page 11: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 12

Last Year - Distribution Technology Studies & Results

• Frontier has been integrated into LCG software framework– POOL plug-in in used by CMS– workplan for COOL/Frontier by ATLAS (with ATLAS manpower)

• Frontier/POOL in 3D testbed has been successfully used by CMS from several tier 1 and tier 2 sites– Squid caches were indeed quick to setup and easy to deploy

• Streams functionality has been confirmed by a number tests – With COOL in ATLAS and LHCb

• LHCb: streams functionality shown, streams overhead can be neglected wrt to intrinsic (COOL client) performance

– With POOL in CMS• Calibration data written online can be streamed to offline database and can be picked up as C++ objects through POOL/ORA (direct DB connection) or POOL/Frontier (squid cached connection)

Page 12: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 13

Distribution Technologies - Move to Production Deployment

• Moved on from functionality to production deployment & scalability– Finalize technology report for database service, streams and

frontier

• Further decoupling between Tier 0 and Tier 1 sites– Streams data capture now done on a separate machine based on log

files - no impact of eg WAN problems on T0 database

• Further decoupling among Tier 1 sites– Prototyping with Oracle decoupled queues for functional sites

(changes leave CERN capture machine with little latency) and problem sites (change queue kept on disk for few days)

• T0 Frontier servers - high availability / scaling– 3-Node production setup with DNS load balancing and client

failover– Backend: of 4-node experiment database cluster

Page 13: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 14

Building Block for Tier 0/1 -

Oracle Database Clusters• Two+ dual-CPU

nodes

• Shared storage (eg FC SAN)

• Scale CPU and I/O ops (independently)

• Transparent failover and s/w patches

• LHC database services are deployed on RAC

• All 3D production sites agreed to setup a RAC cluster

Page 14: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 15

CERN Hardware evolution for 2006

Current StateALICE ATLAS CMS LHCb Grid 3D Non-LHC Validati

on

- 2-node offline

2-node 2-node 2-node - - 2x2-node

2-node online test

Pilot on disk server

Proposed structure in Q2 2006 2-node

4-node 4-node 4-node 4--node

2-node 2-node (PDB replacement

)2-node

valid/test

2-node valid/te

st

2-node valid/test

2-node pilot

Compass??

Online?

• Linear ramp-up budgeted for hardware resources in 2006-2008

• Planning next major service extension for Q3 this year

Slide : Maria Girone

Page 15: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees

Dirk Duellmann 16

Frontier Production Frontier Production Configuration at Tier 0Configuration at Tier 0

Squid runs in http-accelerator mode (as a reverse proxy server)

Slide : Luis Ramos

Page 16: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 17

LCG Database Deployment Plan

• After October ‘05 workshop a database deployment plan has been presented to LCG GDB and MB – http://agenda.cern.ch/fullAgenda.php?ida=a057112

• Two production phases • April - Sept ‘06 : partial production service

– Production service (parallel to existing testbed)– H/W requirements defined by experiments/projects– Based on Oracle 10gR2– Subset of LCG tier 1 sites: ASCC, CERN, BNL, CNAF, GridKA, IN2P3, RAL

• October ‘06- onwards : full production service– Adjusted h/w requirements (defined at summer ‘06 workshop)– Other tier 1 sites joined in: PIC, NIKHEF, NDG, TRIUMF

Page 17: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 19

Tier 1 Progress

• Sites largely on schedule for a service start end of March– h/w either installed already (BNL, CNAF, IN2P3) or expect

delivery of order shortly (GridKA, RAL)– Some problems with Oracle Clusters technology encountered and

solved!– Active participation from sites - DBA community building up

• First DBA meeting focusing on RAC installation, setup and monitoring hosted by Rutherford scheduled for second half of March

• Need to involve remaining Tier 1 sites now! – Established contact to PIC, NIKHEF/SARA, NSG, TRIUMF to follow

workshops, email and meetings

• Next work shop 23rd of March hosted by RAL– Focus: finalizing DB Server and monitoring setup at T0 and T1

Page 18: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 20

Progress on Open Issues

• Open Issues– X.509(proxy) certificates - supported by Oracle?

• Investigating fallback solutions

• Closed– s/w and support licenses for Tier 1

• Close to formal agreement– Instant client distribution within LCG

• Agreed with Oracle– Frontier apps server support

• During initial phase (March-Sept) CMS proposed to support tomcat/frontier/squid setup

Page 19: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 22

LCG Software Progress

• LCG Applications now based on CORAL – Includes re-try and failover required for reliable db service

• DB lookup based on XML based list of databases • Prototyping integration with LFC with CAT team (India)

– POOL includes production version FroNTier plug-in• Concrete caching policies

– S/w on schedule for 2006 deployment

• LCG s/w expected to be stable by end of February for distributed deployment as part of SC4 or experiment challenges

• Caveats:– COOL still has important functionality items on the

development plan for this year – Schema changes will need careful planning for COOL and

FroNTier

Page 20: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 23

Experiment Applications Status

• Conditions - Driving the database service size at T0 and T1– EventTAGs (may become significant - need replication tests

and concrete experiment deployment models)

• Framework integration and DB workload generators exist– Functionality tested in various COOL and POOL/FroNTier tests – T0 performance and replication tests (T0->T1) look ok

• Conditions: Online -> Offline replication starting now – CMS and ATLAS are executing online test plans

• Progress in defining concrete conditions data models– CMS showed most complete picture (for Magnet Test)– Still some uncertainty about volumes, numbers of clients

Page 21: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 24

Summary

• Significant progress in all areas of the project

• Production Schedule defined Phase 1 - end of March: ASCC, BNL, CERN, CNAF, IN2P3, RAL – Full deployment - end of September: PIC, NIKHEF, NDG, TRIUMF

• DB clusters deployed at Tier 0 and Tier 1 – Distribution setup common components of experiment plans– Oracle service (online/T0/T1), Streams(online/T0/T1),

Frontier(T1/T2), MySQL/SQLight(T2+)

• Setup progressing on schedule at tier 0 and 1 sites

• Application moved to performance testing– First larger scale conditions replication tests with promising

results for streams and frontier technologies

• Main risk: remaining uncertainty on conditions payload

Page 22: 3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project  Meeting with LHCC Referees, March 21st 06

Meeting with LHCC Referees Dirk Duellmann 26

Conclusions

• There is little reason to believe that a distributed database service will move into stable production any quicker than any of the other grid services

• Should start now with larger scale production operation to resolve the unavoidable deployment issues

• Need the cooperation of experiments (and sites) to make sure that concrete requests can be quickly validated against the pre-production service