database workshop report - indico · oracle version 2.3 (accelerator control) distributed databases...

13
Database Workshop Report Eva Dafonte Perez IT-DB

Upload: others

Post on 16-Oct-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Database Workshop Report

Eva Dafonte Perez

IT-DB

Page 2: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Workshop Aims

• Database Futures Workshop– 29th-30th May, 2nd edition

– https://indico.cern.ch/event/615499/overview

– 74 participants registered

• Aims– Discuss future requirements in the database area

– Identify common needs between user communities

– Evaluate new trends & technologies

– Understand how services should evolve/improve to fulfil new requirements

2

Page 3: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

1970 1980 1990 2000 2010

Relational Model

Object Relational Databases (ORDBMS)

Object Oriented Databases (OODBMS)

Oracle version 2.3(accelerator control)

Distributed Databases

PostgreSQL

MySQLOracle (RSI)

Big DataAnalytics

NoSQL

Hadoop

Time Series Databases(TSDB)

Database On DemandMySQL, PostgreSQL

InfluxDBHadoopService

ElasticSearch

Relational Databases(DBMS)

Database Futures workshop (1st edition)

ElasticSearch

InfluxDB

3

Page 4: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Tracks• Requirements for run3&4

– Data volumes, update and access rates; application criticality; HA, replication needs, business continuity, cloud services, virtualization; machine learning; development and deployment process

• Implementations & Technologies

– Various database engines and technologies, innovative database applications; comparisons

• Going beyond relational

– Technologies and use cases: Big Data, Hadoop, Spark, …

4

Page 5: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Requirements for Run3&4• Traditional applications will still be key elements in the HL-LHC era

• Increase of database applications to index events and analysis applications

– Based on relational databases and NoSQL technologies

• Overall the relational model is valid

• Oracle is the preferred solution for critical applications

– Exceptions: ALICE (“zoo of solutions”), LHCb (mainly MySQL)

– Cost-effective platform in terms of functionality and performance

– Expertise

– Support is a key factor

5

Page 6: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Requirements for Run3&4• Run3&4 larger insert and update rates = database workload increase

– Advance Oracle 12c features (in-memory, new partitioning, …)• Migration to Oracle 12cR2 in LS2

– Powerful hardware to improve response times

• More difficulties to schedule interventions– Move towards zero downtime

– Fast-switching standby services

• Alternatives to Oracle for smaller projects to facilitate– Collaboration with other institutes

– Open sourcing

6

Page 7: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Implementations & Technologies• New systems/evolution – move towards NoSQL solutions

– New accelerator logging service (NXCALS)

– Next generation archiver

– Next generation for Post Mortem event storage and analysis

– Conditions data management system for HEP experiments

– CMS Big Data project

7

Page 8: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

8

Page 9: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Implementations & Technologies• Motivation

– Scale out

– Enable data analytics

– Newer technologies more appropriate to solve specific use cases

• No antagonism SQL vs NoSQL anymore

• Risks in the medium term

– Less interest / disappear

– Difficult to maintain

9

Page 10: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Implementations & Technologies• Provenance

– LHCb bookkeeping, CMS analysis, …

– Integrate origin / meta information important for further analysis

• Database on Demand

– Supports MySQL, PostgreSQL (relational) and InfluxDB (time series)

– Backup & Recovery, HA, Monitoring updates

– Working to offer instances in TN

– Help to use different DBMS

• Open source tools available to facilitate migration

• DBoD team can be contacted

10

Page 11: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

Going beyond relational • Data Analytics

– Hadoop, Spark, Sqoop, Impala, Hbase, Hive and Pig

• Centralised Elasticsearch service– Distributed, RESTful search and analytics engine

• Hadoop and ElasticSearch becoming critical to ATLAS• Growing interest on Time Series databases

– Easier analysis– Improved storage and ingestion rates– InfluxDB use cases:

• DBoD monitoring• IT monitoring

• Streams processing– Kafka pilot service use cases:

• Accelerator logging service• Computing infrastructure monitoring

11

Page 12: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

In general…• Positive feedback on database services by IT

• Fruitful discussions

• Synergies (even overlaps)

– Collaboration

– Scope to optimise resources for all

• Next similar workshop in 2019 (LS2)

– Given the dynamic nature of the technologies

– Many projects in development

12

Page 13: Database Workshop Report - Indico · Oracle version 2.3 (accelerator control) Distributed Databases PostgreSQL Oracle (RSI) MySQL Big Data Analytics NoSQL Hadoop Time Series Databases

13