the coral project - ogf.org• coral is widely used as foundation for physics online and offline...
TRANSCRIPT
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
The CORAL Project
Dirk Düllmann for the CORAL teamOpen Grid Forum, Database Workshop
Barcelona, 4 June 2008
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Outline
• CORAL - a foundation for Physics Database Applications in the LHC Computing Grid (LCG)– Project architecture and evolution
• Review of remaining deployment issues– Secure database access– Efficient use of server resources for many concurrent
clients– Software distribution
• CORAL server design study– Integration with existing code base– Server connection and threading model– Network protocol structure– Results from early prototype studies
2
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Disclaimer
• This is a development proposal and does therefore contains in several areas:– vapour ware, hand waving– implementation choices may still change
• Still needs discussion beyond pure development– deployment model/resources at T0 and T1
• Still, the aim of this presentation is to convince you that the development– is necessary to insure scalability and high availability of DB
services– design aspects are reasonably well understood– could be introduced on a time scale of end of this year
without major disruption to existing CORAL apps/services– can be done with rather limited resources in the Persistency
Framework X
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
CORAL DB Foundation
Computing ServicesFiles, Databases, Grid Services
Physics Applicationscommon or experiment specific
CORALRDBMS Access, Indirection, Authentication/Authorisation
COOLValidity Intervals,Conditions Data
POOLObject Storage,
Meta Data and Collections,File Catalogs
3
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it X
“SQL-free” API
coral::ISchema& schema = session.nominalSchema();coral::TableDescription tableDescription;tableDescription.setName( “T_t” );tableDescription.insertColumn( “I”, “long long” );tableDescription.insertColumn( “X”, “double” );schema.createTable( tableDescription);
CREATE TABLE T_t ( I BIGINT, X DOUBLE PRECISION)
CREATE TABLE “T_t” ( I NUMBER(20), X BINARY_DOUBLE)
Example 1: Table creation
Oracle MySQL
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it 4
Vendor independent “SQL-free” API
coral::ITable& table = schema.tableHandle( “T_t” );coral::IQuery* query = table.newQuery();query->addToOutputList( “X” );query->addToOrderList( “I” );query->limitReturnedRows( 5 );coral::ICursor& cursor = query->execute();
SELECT * FROM( SELECT X FROM “T_t” ORDER BY I )WHERE ROWNUM < 6
SELECT X FROM T_t ORDER BY I LIMIT 5
Example 2: Issuing a query
Oracle MySQL
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
CORAL - Software Components
RDBMS Implementation(oracle)
RDBMS Implementation(sqlite)
RDBMS Implementation(frontier)
RDBMS Implementation(mysql)
Authentication Service(xml)
Authentication Service(environment)
Authentication Service(lfc)
Lookup Service(xml)
Lookup Service(lfc)
Relational Serviceimplementation
Monitoring Serviceimplementation
Connection ServicePools, Retry, Failover
Plug-in libraries, loaded at run-time,interacting only through the interfaces
CommonImplementation
developer-levelinterfaces
CORAL Interfaces(C++ abstract classes
user-level API)
CORAL C++ types(Row buffers, BLOBs,Date, TimeStamp,...)
Client Software
5
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Supported Database Back-ends
• Oracle - 10g R2– based on C level interface (OCI)
• performance and flexibility in compiler choice
– all API concepts / optimisations supported natively • bind variables, bulk operations, session pooling
• MySQL - V5 (and 4)– small to medium sized services – standalone development set-ups
• SQLight - V3.4– server-less, file based access– popular transport/replication medium for read-only data
• FroNTier/SQUID - caching layer– read-only access to Oracle via http– multi-level caching between server and client to avoid
network and server latencies for repeated queries6
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Applications using CORAL
• LCG Persistency Framework components POOL and COOL– fully replaced direct database dependencies with
CORAL• POOL File catalogs• POOL Event collections• COOL conditions database
• CORAL is also used directly – ATLAS: detector geometry several online– CMS: conditions data, POPCON framework– LHCb: online applications– Astrophysics: LSST project is evaluating CORAL
for storage of large volume image data7
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Building reliable Applications and Services with CORAL
• The reliability of a composite service depends crucially on core services (eg DB server)
– but can easily exceed those if a valid retry and fail-over mechanisms are used
– Unfortunately this often happens only late in the application development or is only incomplete
– Many different retry and fail-over strategies complicate deployment of db apps
• CORAL implements a consistent retry and fail-over strategy which can be parametrised according to the application needs
– parameters are kept in one place (eg service look-up file or DB catalog)
• This does not completely eliminate the need for specialised handling of error conditions during data manipulation
• but covers the most frequent problems to obtain a database connection (eg on a overloaded or temporarily unavailable DB server) without loosing the execution state of an application/service
X
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Database Authentication
• Different requirements in different areas of physics computing– Online databases - controlled environment, few applications,users and
roles
• User/password pairs still manageable
• Integration between online and offline accounts is an issue
– Offline data production - reconstruction/simulation
• More users (and roles), apps environment shared with grid • DB passwords still widely used, but with security issues
– Grid jobs - reprocessing, end user analysis
• Large user community, spreading many organisation boundaries• User/password pairs can not
– be shipped with job to a worker node
– be maintained for a large virtual organisation
• Certificates (X.509 proxy cert’s) are used by scientific grids– support from commercial vendors still missing
• CORAL support the above via a set of plug-ins coupling to grid services as required
X
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Lookup & Connection Service
• Problem: DB clients need to obtain physical DB connection parameters (aka “connect string”)– avoids hard-code credentials, handles DB replica selection,
connection retry and fail-over in case of problems
• Several CORAL plug-ins can be selected at runtime to handle the above in different computing environments– via shell environment (assumes controlled machine access)– via XML files (assumes controlled config file access)– via LFC catalog (certificates protected DB access
credentials), VOMS integration for authorisation
• CORAL maintains a client side pool of connections– network connection is shared by several logical DB sessions– authentication overhead is minimised
8
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Remaining Issues - Access Security
• Oracle server does not support grid authentication (X.509 proxy certs) and grid authorisation (eg VOMS groups)– CORAL implemented a secure grid enabled store for database
credentials (db user/pw pairs) based on LFC as a workaround
– being picked up slowly
– additional dependency on LFC service
• Many physics application still exposed to possible security related outages, which severely impact on production– disclosed user/password pairs
– vulnerabilities/exploits of Oracle servers exposed to the internet
• Need progressively to – move to grid certificates for external connections
• apart from applications for which the password distribution (and change - after eg an password exposure) is well controlled
– minimise external exposure of Oracle server ports• eg by moving CORAL based applications to a secure protocol
9
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Secure External Connections
• Server protocol is split into– control path: connection requests, coral client requests
– data path: server response eg query result sets
• For external connections– use GSI or SSL for control path
– client certificate required to connect from outside the site network
– data path can stay open (at least for HEP)
– credentials will be evaluated once per session
– VOMS groups from certificate define database user and privileges
– Real database user and password used only by CORAL server
• Online environments may continue with open control path
– coral://some-online-database (LAN connection)
• plus existing coral authentication file
– corals://some-offline-database (WAN connection)• no further information apart from certificate required
10
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Cacheable Network Protocol
• Several experiments have introduced caching layers to increase scalability by exploiting many identical queries (r/o)– CMS: FroNTier plug-in to CORAL– ATLAS online: caching MySQL proxy
• Propose to move caching layer in front of the CORAL server– expose caching relevant data in network protocol
• Goal: – allow to create a generic CORAL proxy cache (main user
initially ATLAS online)– allow to multiplex connections in a hierarchical set of cache
servers– CoralProxyServer under development by ATLAS contributors
11
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Efficient Database Server Use
• Physics apps tend to use databases with– thousands of concurrent connections/sessions– rather limited data traffic on each connection per time unit
• Result– Oracle clusters manages very many idle processes– waste of server memory and CPU cycles, which could be
used for caching database blocks and executing queries
• Need to bundle connections in front of the database server– the existing Connection Service component does that, but is
limited to pool only connection in the hosting process– need to move this component closer to the database server
12
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Remaining Issues - Client s/w Distribution
• CORAL clients link (at runtime) against the specific database access libraries – eg Oracle instant client, MySQL client
• Even though an agreement exists with Oracle to distribute the client software within LCG– configuration management issues because of
compiler dependencies (eg for OCCI)– risk of failing to properly implement export
constraints requested by the Oracle license
• Both could be avoided/reduced by making the CORAL client fully independent of any vendor provided client s/w
X
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Stateless vs Stateful
• Database connections are intrinsically stateful– Transaction context spans many operations– Bulk queries/inserts are spread over several client server
communications– Authentication and authorisation state is expensive to obtain
and dominates CPU use of many existing services
• Stateless connection model would lead to – many repeated operations to recover existing state– problems with lifetime management of transaction/query/
security related data in the server if client programs fail
• CORAL server does not keep state beyond transaction boundaries– several servers can run in parallel (eg DNS load balanced)– servers failure looses only the state of uncommitted
transactions13
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Reuse in the CORAL Server
DB Server
User Code
CORAL API
Oracle Plug-in
Oracle Client
Connection Pool
Oracle Protocol
User Code
CORAL API
Connection Pool
Proxy Plug-in
CORAL Protocol
Oracle Protocol
Coral Server
Connection Pool
Oracle Plug-in
Oracle Client
Network
new component
Coral Server
Connection Pool
Oracle Plug-in
Oracle Client
Coral Server
Connection Pool
Oracle Plug-in
Oracle Client14
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Reuse in the CORAL Server
DB Server
User Code
CORAL API
Oracle Plug-in
Oracle Client
Connection Pool
Oracle Protocol
User Code
CORAL API
Connection Pool
Proxy Plug-in
CORAL Protocol
Oracle Protocol
Coral Server
Connection Pool
Oracle Plug-in
Oracle Client
Network
new component
ProxyCacheProxyCacheProxyCache
Coral Server
Connection Pool
Oracle Plug-in
Oracle Client
Coral Server
Connection Pool
Oracle Plug-in
Oracle Client14
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Summary
• CORAL is widely used as foundation for physics online and offline database applications– By introducing a CORAL server in front of the
Oracle/MySQL databases we could remove several of the remaining deployment issues
– CORAL wire protocol is open and designed to facilitate external caching proxies
• The development profits from significant reuse of existing components and which are integrated adiabatically with minimal impact on existing code.
• Production deployment planned for end of this year– Read-only, caching prototype earlier under test
with ATLAS online now
15
CERN IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Other CORAL Features
• Python access to CORAL user level API – Same semantics as C++ components, but
maintaining Python look-and-feel
• Generic database copy tool– copy full DBs or slices between CORAL supported
back-ends
• Several developments in cooperation with RRCAT, Indore (India)
X