empirical evaluation of state-of-the-art databases on

121
Otto-von-Guericke-Universit¨ at Magdeburg Faculty of Computer Science D S E B Databases Software Engineering and Master’s Thesis Empirical evaluation of state-of-the-art databases on mixed workloads with HTAPBench Author: Param Pawar August 8, 2019 Advisors: M.Sc. Gabriel Campero Durand, Prof. Dr. rer. nat. habil. Gunter Saake Databases and Software Engineering Workgroup, University of Magdeburg

Upload: others

Post on 16-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Empirical evaluation of state-of-the-art databases on

Otto-von-Guericke-Universitat MagdeburgFaculty of Computer Science

DS EB

Databases

SoftwareEngineering

and

Master’s Thesis

Empirical evaluation ofstate-of-the-art databases on mixed

workloads with HTAPBench

Author:

Param PawarAugust 8, 2019

Advisors:

M.Sc. Gabriel Campero Durand,Prof. Dr. rer. nat. habil. Gunter Saake

Databases and Software Engineering Workgroup,University of Magdeburg

Page 2: Empirical evaluation of state-of-the-art databases on

Pawar, Param:Empirical evaluation of state-of-the-art databases on mixed workloads with HTAP-BenchMaster’s Thesis, Otto-von-Guericke-Universitat MagdeburgFaculty of Computer Science, 2019.

Page 3: Empirical evaluation of state-of-the-art databases on

Abstract

Workloads, the mix of transactions or queries processed by database systems, can bebetter supported with differently designed systems. The most common workloads areonline transactional processing (OLTP), which comprise of transactions operating onsmaller volumes of data performing heavy write operations, and online analytical pro-cessing (OLAP), which include latency-sensitive business intelligence queries operatingover large volumes of data. With the forging of advanced technology resources, the longenvisioned demand for database systems operating on mixed workloads has been gainingmomentum for many years. These state-of-the-art HTAP (hybrid OLAP and OLTP)systems are built to be capable of processing efficiently mixed workloads on a singleengine. This enables business analytics to be drawn over the most recent data, which isbroadly termed as operational analytics or real-time analytics.

Database systems are evaluated for their performance with the assistance of standardbenchmarks. As with the case of the two distinct database systems (OLTP and OLAP),common benchmarks too were designed to address the performance measures of theOLTP and OLAP workloads independently. The emergence of HTAP database systemsoperating on mixed workloads upsurged the demand for benchmarks to assess the systemperformance for such workloads. To this end benchmarks like CH-benCHmark andHTAPBench have been built.

This thesis provides an empirical evaluation of modern database systems using theHTAPBench benchmark. This benchmark is unique, in enabling to test the maximumOLAP performance achievable at a target OLTP performance, for a given amountof queries. In our work we study different OLAP (MonetDB), OLTP (PostgreSQL,MySQL) and HTAP-specific (CockroachDB, MemSQL) database systems, at differentisolation levels and scale factors. We also evaluate the impact of some system-specificconfigurations. In our study we identify different behaviors from the systems overconfiguration changes. By using the unified metric of HTAPBench, we are able toidentify which systems lie at the best trade-off options. We find that MemSQL andPostgreSQL perform better for a serializable isolation level, considering mixed workloads,with the latter being better if the mix prefers OLAP performance. Similarly we identifyMemSQL and MonetDB to perform better for a read committed isolation level, with thelatter being better for OLAP-inclined mixed workloads. To our knowledge our studyis the first one to report studies with the HTAPBench benchmark over commercialsystems.

Page 4: Empirical evaluation of state-of-the-art databases on

iv

Page 5: Empirical evaluation of state-of-the-art databases on

Acknowledgements

It’s been an astonishing experience throughout the course of this thesis. I extend mysincere gratitude to all the people that made this thesis possible.

First and foremost I would like to express my gratitude to my supervisor M.Sc. GabrielCampero Durand. I have experienced the best possible work environment at his office. Ithank him for his valuable directions and the colossal backing, be it at stumbling blocksor microscopic concerns. I am indebted to him for the beautiful journey of this thesisand the learning experience.

I would like to thank Fabio Coelho from the HTAPBench team for supporting uswith some obstacles using their tool. I must also thank them for building such interest-ing tool in the first place.

I would like to thank Prof. Dr. rer. nat. habil. Gunter Saake for giving me with anopportunity to draft my Master thesis under his chair.

Finally, I must express my indebtedness to my parents for their eternal support andendless inspiration through the process of researching and writing this thesis. Thank you.

I would also like to thank my friends for their conducive guidance and support.

With this thesis my long term collaboration with Otto-von-Guericke-Universitat Magde-burg comes to an end. It was a privilege to work in collaboration with the DBSEgroup.

Page 6: Empirical evaluation of state-of-the-art databases on

vi

Page 7: Empirical evaluation of state-of-the-art databases on

Declaration of Academic Integrity

I hereby declare that this thesis is solely my own work and I have cited all externalsources used.Magdeburg, August 8th, 2019

———————–Param Pawar

Page 8: Empirical evaluation of state-of-the-art databases on
Page 9: Empirical evaluation of state-of-the-art databases on

Contents

List of Figures 7

List of Tables 9

1 Introduction 11.1 Research Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Technical Background 52.1 Literature Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Workload-Specific Designs in Database Systems . . . . . . . . . . 62.1.2 Workload-Based Database Benchmarks . . . . . . . . . . . . . . . 8

2.2 Workload-Specific Designs in Database Systems . . . . . . . . . . . . . . 102.2.1 Database Workloads . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1.1 Workloads Classification . . . . . . . . . . . . . . . . . . 102.2.1.2 Workload Management . . . . . . . . . . . . . . . . . . . 13

2.2.2 Online Transaction Processing (OLTP) Database Systems . . . . 162.2.2.1 Key Characteristic Features - OLTP . . . . . . . . . . . 162.2.2.2 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.2.3 MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.3 Online Analytical Processing (OLAP) Database Systems . . . . . 212.2.3.1 Key Characteristic Features - OLAP . . . . . . . . . . . 212.2.3.2 MonetDB . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.4 Hybrid Transactional and Analytical Processing (HTAP) DatabaseSystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.4.1 MemSQL . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.4.2 CockroachDB . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Workload-based Database Benchmarks . . . . . . . . . . . . . . . . . . . 262.3.1 OLTP Benchmark - TPC-C . . . . . . . . . . . . . . . . . . . . . 272.3.2 OLAP Benchmark - TPC-H . . . . . . . . . . . . . . . . . . . . . 282.3.3 Hybrid Transactional/Analytical Processing (HTAP) Benchmarks 31

2.3.3.1 TPC-Ch / CH-benChmark . . . . . . . . . . . . . . . . 312.3.3.2 HTAPBench . . . . . . . . . . . . . . . . . . . . . . . . 33

Page 10: Empirical evaluation of state-of-the-art databases on

2 Contents

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 Prototypical implementation and research questions 413.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2 Evaluation prototype and work process . . . . . . . . . . . . . . . . . . . 42

3.2.1 HTAPBench Requirements . . . . . . . . . . . . . . . . . . . . . . 443.2.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3 Database Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.3.1 OLTP Database Systems . . . . . . . . . . . . . . . . . . . . . . . 453.3.2 OLAP Database Systems . . . . . . . . . . . . . . . . . . . . . . . 453.3.3 HTAP Database Systems . . . . . . . . . . . . . . . . . . . . . . . 46

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Evaluation of OLAP and OLTP Database Systems with HTAPBench 474.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Results over OLTP Database Systems . . . . . . . . . . . . . . . . . . . 48

4.2.1 Results Interpretation and Evaluation: PostgreSQL . . . . . . . . 484.2.2 Results Interpretation and Evaluation: MySQL . . . . . . . . . . 54

4.2.2.1 Transaction Isolation: Read-Uncommitted . . . . . . . . 554.2.2.2 Transaction Isolation: Read-Committed . . . . . . . . . 574.2.2.3 Transaction Isolation: Repeatable-Read . . . . . . . . . . 594.2.2.4 Transaction Isolation: Serializable . . . . . . . . . . . . 61

4.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3 OLAP Database System . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.3.1 HTAPBench Test Results over MonetDB . . . . . . . . . . . . . . 664.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 Evaluation of HTAP Database Systems with HTAPBench 735.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.2 Results Interpretation and Evaluation . . . . . . . . . . . . . . . . . . . . 74

5.2.1 Results over MemSQL . . . . . . . . . . . . . . . . . . . . . . . . 745.2.1.1 Transaction Isolation: Read-Uncommitted . . . . . . . . 755.2.1.2 Transaction Isolation: Read-Committed . . . . . . . . . 775.2.1.3 Transaction Isolation: Repeatable-Read . . . . . . . . . . 795.2.1.4 Transaction Isolation: Serializable . . . . . . . . . . . . 815.2.1.5 Column-Store . . . . . . . . . . . . . . . . . . . . . . . . 84

5.2.2 Results over CockroachDB . . . . . . . . . . . . . . . . . . . . . . 845.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.3 Analogy across Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6 Conclusions and Future Work 916.1 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Page 11: Empirical evaluation of state-of-the-art databases on

Contents 3

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Glossary 95

Bibliography 97

Page 12: Empirical evaluation of state-of-the-art databases on
Page 13: Empirical evaluation of state-of-the-art databases on

List of Figures

1.1 HTAP database system integration [1] . . . . . . . . . . . . . . . . . . . 2

2.1 Multi-dimensional DBMS performance analysis [2] . . . . . . . . . . . . . 12

2.2 Envisioned graphs of system performance with mixed workloads [3] . . . 13

2.3 Workload management techniques: proposed taxonomy [4] . . . . . . . . 14

2.4 Isolation levels v/s read phenomena [5] . . . . . . . . . . . . . . . . . . . 17

2.5 MySQL design architecture [6] . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Row-oriented v/s vectorized column-oriented query processing [7] . . . . 22

2.7 CockroachDB architecture [8] . . . . . . . . . . . . . . . . . . . . . . . . 25

2.8 TPC-C database schema [9] . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.9 TPC-H database schema [10] . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.10 CH-benCHmark database schema [10] . . . . . . . . . . . . . . . . . . . . 32

2.11 HTAPBench architecture [11] . . . . . . . . . . . . . . . . . . . . . . . . 33

2.12 HTAPBench execution cycle [11] . . . . . . . . . . . . . . . . . . . . . . 34

2.13 Initial acquired results over OLTP SUT [11] . . . . . . . . . . . . . . . . 36

2.14 Initial acquired results over OLAP SUT [11] . . . . . . . . . . . . . . . . 37

2.15 Initial acquired results over HTAP SUT [11] . . . . . . . . . . . . . . . . 38

2.16 Quadrant plot - HTAPBench unified metric . . . . . . . . . . . . . . . . 39

3.1 Prototypical implementation cycle . . . . . . . . . . . . . . . . . . . . . . 42

3.2 HTAPBench - Installing database schema [11] . . . . . . . . . . . . . . . 43

3.3 HTAPBench - Data loading [11] . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 HTAPBench - Launching the test [11] . . . . . . . . . . . . . . . . . . . . 44

Page 14: Empirical evaluation of state-of-the-art databases on

6 List of Figures

4.1 System configurable parameters . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 PostgreSQL registered performance: Read-Committed . . . . . . . . . . . 50

4.3 PostgreSQL registered performance: Serializable . . . . . . . . . . . . . . 53

4.4 MySQL JDBC connection configuration . . . . . . . . . . . . . . . . . . 54

4.5 MySQL registered performance: Read-Uncommitted and targettps 1 . . . 55

4.6 MySQL registered performance: Read-Uncommitted and higher scale factors 56

4.7 MySQL registered performance: Read-Committed and target tps 1 . . . . 57

4.8 MySQL registered performance: Read-Committed and higher scale factors 58

4.9 MySQL registered performance: Repeatable-Read and targettps 1 . . . . 59

4.10 MySQL registered performance: Repeatable-Read and higher scale factors 60

4.11 MySQL registered performance: Serializable and target tps 1 . . . . . . . 61

4.12 MySQL registered performance: Serializable and higher scale factors . . . 63

4.13 MySQL registered performance: MyISAM engine . . . . . . . . . . . . . 64

4.14 Unified metric quadrant plot: PostgreSQL . . . . . . . . . . . . . . . . . 65

4.15 Unified metric quadrant plot: MySQL . . . . . . . . . . . . . . . . . . . 65

4.16 MonetDB JDBC connection configuration . . . . . . . . . . . . . . . . . 67

4.17 MonetDB registered performance . . . . . . . . . . . . . . . . . . . . . . 68

4.18 Unified metric quadrant plot: MonetDB . . . . . . . . . . . . . . . . . . 70

5.1 MemSQL JDBC connection configuration . . . . . . . . . . . . . . . . . . 74

5.2 MemSQL registered performance: Read-Uncommitted and target tps 1 . . 75

5.3 MemSQL registered performance: Read-Uncommitted and higher scalefactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.4 MemSQL registered performance: Read-Committed and target tps 1 . . . 77

5.5 MemSQL registered performance: Read-Committed and higher scale factors 78

5.6 MemSQL registered performance: Repeatable-Read and target tps 1 . . . 79

5.7 MemSQL registered performance: Repeatable-Read and higher scale factors 80

5.8 MemSQL registered performance: Serializable and target tps 1 . . . . . . 81

5.9 MemSQL registered performance: Serializable and higher scale factors . . 83

5.10 MemSQL registered performance: Column-store . . . . . . . . . . . . . . 84

Page 15: Empirical evaluation of state-of-the-art databases on

List of Figures 7

5.11 CockroachDB JDBC connection configuration . . . . . . . . . . . . . . . 84

5.12 CockroachDB registered performance: Serializable . . . . . . . . . . . . . 85

5.13 Unified metric quadrant plot: MemSQL . . . . . . . . . . . . . . . . . . 86

5.14 Unified metric quadrant plot: cumulative overview with read committedisolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Page 16: Empirical evaluation of state-of-the-art databases on

8 List of Figures

Page 17: Empirical evaluation of state-of-the-art databases on

List of Tables

2.1 Overview of the initial published results [11] . . . . . . . . . . . . . . . . 39

4.1 Target tps vs. Scale Factor . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 PostgreSQL unified metric: Read-Committed . . . . . . . . . . . . . . . . 51

4.3 PostgreSQL unified metric: Serializable . . . . . . . . . . . . . . . . . . 52

4.4 MySQL unified metric: Read-Uncommitted . . . . . . . . . . . . . . . . . 55

4.5 MySQL unified metric: Read-Committed . . . . . . . . . . . . . . . . . . 57

4.6 MySQL unified metric: Repeatable-Read . . . . . . . . . . . . . . . . . . 59

4.7 MySQL unified metric: Serializable . . . . . . . . . . . . . . . . . . . . . 62

4.8 MonetDB unified metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.1 MemSQL unified metric: Read-Uncommitted . . . . . . . . . . . . . . . . 75

5.2 MemSQL unified metric: Read-Committed . . . . . . . . . . . . . . . . . 77

5.3 MemSQL unified metric: Repeatable-Read . . . . . . . . . . . . . . . . . 79

5.4 MemSQL unified metric: Serializable . . . . . . . . . . . . . . . . . . . . 82

5.5 Overview of acquired results . . . . . . . . . . . . . . . . . . . . . . . . . 87

Page 18: Empirical evaluation of state-of-the-art databases on

10 List of Tables

Page 19: Empirical evaluation of state-of-the-art databases on

1. Introduction

Research for the integration of the two traditional database systems - online transactionalprocessing (OLTP) and online analytical processing (OLAP) systems has been underwayfor some years now[12]. Improvements in the main-memory capacity and multi-coreprocessors have made it possible for successful integration of the two traditional databasesystem types with a relative isolation, to support low-latency real-time analytics whilekeeping a high throughput for OLTP transactions [13]. In this era of big data andartificial intelligence, businesses can no longer compete without insights from real-time analytics over their transactional database system. This is in particular of primeimportance for trading businesses, online-stores, social media platforms, etc, wherefeed-backs and insights are expected in milliseconds. This is where hybrid engines canplay a vital role, relieving from the need of integrating many systems to fulfill corebusiness data management needs.

The term hybrid transactional and analytical processing (HTAP) system was first usedby Gartner [14], where Pezzini et al narrated the importance of real-time analytics forbusiness. HTAP refers to system architectures and mechanisms that enable contemporarydatabase management systems (DBMS) to perform real-time analytics on data that isingested and modified in the transactional database engine [12]. HTAP systems are alsodescribed as OLTP database systems with business analytical capabilities given theirobjective to sustain higher transactional throughput while still being able to performanalytical operations. Another crucial aspect to be noticed is that the substantial ETLprocesses to pipeline data from transactional memory to warehouses is eschewed inHTAP database systems.

A brief understanding about the scale of integration can be interpreted from Figure 1.1.Fresh or recent data resides in the main-memory over which real-time analytics canbe acquired while stale data is periodically flushed into a warehouse-like storage. Thestorage design aspects are subject to individual database management systems andcan produce varying results based on their design and configuration characteristics [3],

Page 20: Empirical evaluation of state-of-the-art databases on

2 1. Introduction

including aspects like indexing, memory partitioning, isolation levels, query processingapproaches, and compilation, compression, durability and recovery, etc. Some examplesystems that make different choices about these designs include SAP Hana, MemSQL,Hyper, SnappyData, among others.

Figure 1.1: HTAP database system integration [1]

With the emergence of hybrid database systems (HTAP) there is an urgency to have abenchmark capable to test such systems at different proportions of hybrid workloads.The traditional approach of testing database systems relies on separate benchmarks fortransactional and analytical capabilities. HTAP systems too have been initially testedagainst these benchmarks - TPC-C and TPC-H, independently, to evaluate the system’stransactional and analytical capabilities, respectively. Some examples include worktesting [15] three different in-memory database systems with the TPC-H benchmark toevaluate the query execution times.

The state of the art in HTAP testing seems to be the work of Coelho et al [11]. Authorsproposed a benchmark termed as HTAPBench to test database systems with hybridworkloads and log an unified metric which the traditional benchmarking approachesfailed to produce. This unified metric can help us interpret the performance of theunderlying HTAP database system, trade-offs between transaction and query execution,concurrency of clients, workload isolation, etc,. However, to date, beyond the originalwork, where the names of systems under test are anonymous, this benchmark has notbeen used to evaluate systems in an open manner. Hence there is an important researchgap in using this benchmark for trade-off analysis of mainstream database systems. Suchgap limits the information that users and system developers have about the expectedbehavior of systems over mixed workloads.

Page 21: Empirical evaluation of state-of-the-art databases on

1.1. Research Aim 3

In this thesis, we propose to fill this research gap by evaluating database systems built fordifferent workload categories - OLTP, OLAP and HTAP, using HTAPBench. We performexperiments to report the impact of the selected system’s configuration parameters.Through our work we seek to identify core different performance of the categories ofsystems for mixed workloads when changing parameters. Furthermore, based on themetrics we perform some trade-off analysis, to understand better how such techniquecan be used in studying systems with the unified metric proposed by HTAPBench.

1.1 Research Aim

The principal focal points determined for this thesis, which we will use to elaborate ourresearch questions, at a later phase, are listed below:

1. By studying systems with the benchmark, is it possible to identify differencesin the behavior of system of different types, considering how they behave whenscaling-up the workloads and when changing some configurable parameters?

2. Can the benchmark identify weak configurations for some systems, allowingto distinguish between systems of the same type, for their support for mixedworkloads?

3. To what extent can the behaviors identified by Coelho et al[11] be replicated?

4. How can the unified metric given by the benchmark be used to select systems,with some trade-off analysis, for a given expected workload scale, and perhapsexpected configurations?

1.2 Research Methodology

Database system benchmarking is an area with exponentially growing demands. Thereare numerous vendors with state-of-the-art database systems today making it a chal-lenging task for businesses to filter down the best system choice for their operations.Standard benchmarking is also necessary for system builders for understanding andreporting in a reliable manner the benefits of their systems.

In our work we comply to the standard industry approach to database benchmarkingwhich broadly comprises of the following set of tasks:

• Establish the desired performance objectives peculiar to applications or businessoperations.

• Configure the database system parameters in the contest of the target performanceobjectives and the available system resources.

Page 22: Empirical evaluation of state-of-the-art databases on

4 1. Introduction

• Select the most appropriate industry standard benchmark for the blend of workloadexpected in the real-world scenario.

• Deployment of the benchmark over the system under test and interpret the acquiredtest results.

• Repeat the tests with modified configurations to observe the impact on systemperformance.

• Filter down the configuration with operational excellence for the performanceobjectives in considerations.

• Validate the results with additional tests for the selected configuration model.

• Repeat the process for other potential database systems and compare the resultsusing the benchmark metric.

1.3 Structure of Thesis

This thesis is framed as follows:

• Chapter 2: Stages an overview of key aspects and features peculiar to state-of-the-art database systems. This chapter provides the basic background knowledge onworkloads and workload-specific database systems. The final section presents thedesign architecture and metric information of industry standard workload-specificbenchmarks. We have also included the initial results published by the developersof HTAPBench (Chapter 2).

• Chapter 3: Elaborates our research questions pertaining to what seeks to beevaluated over the selected systems, using the selected benchmark. This chapteralso describes the evaluation prototype employed throughout the course of ourexperiments, and implementation details (Chapter 3).

• Chapter 4: Presents the test results acquired for the OLTP and OLAP databasesystems studied, and discusses the evaluation of these results in the context ofHTAP functionality (Chapter 4).

• Chapter 5: Presents the test results acquired for the HTAP database systems. Italso presents the comparison of all the results acquired for the different databasesystems (Chapter 5).

• Chapter 6: In this chapter, we conclude our research work and sum-up ourfindings through the experiments. We also elaborate the threats to the validityof our findings. We try to frame a direction for the future work in the space ofour research area. Finally, we wrap up this thesis with the concluding remarks(Chapter 6).

Page 23: Empirical evaluation of state-of-the-art databases on

2. Technical Background

In this chapter, we provide a basic theoretical background on workload-specific designsin database systems, and on benchmarking system over these workloads. Since databasedesigns is a large field, we limit our discussion, to the necessary concepts to understandour research.

We frame this chapter as follows:

• Literature Overview: We begin the chapter by presenting the sources we utilizedto study the context of our work, in Section 2.1.

• Workload-specific designs in database systems: In Section 2.2, we elaborateon some details pertaining to database workload management, workload classifica-tions and examples of databases residing in these categories. In this section weprovide an overview of the systems that we used for our evaluation.

• Database benchmarking with different workloads: We provide an intro-duction to database systems benchmarking. We discuss the motivation to usebenchmarks, details pertaining to traditional benchmarks for individual workloadsand for benchmarks employing mixed/hybrid workloads, in Section 2.3. In thissection we review the work of HTAPBench, which was selected for our study.

• Summary: We close this chapter with a brief recapitulation of the theoreticalconcepts - database workloads and workload-specific benchmarks, elaborated inthis chapter, in Section 2.4.

2.1 Literature Overview

In this section, we provide a birds-eye view on the papers we selected for our presentationof the topics: workload-specific database designs and database systems benchmarkingfor different types of workloads.

Page 24: Empirical evaluation of state-of-the-art databases on

6 2. Technical Background

2.1.1 Workload-Specific Designs in Database Systems

Database management systems deployed in the corporate world are both complex andflexible, with a plethora of configurable choices. A database system’s performance isprimarily influenced by the choice of database configurations (e.g. index-structures,schema), the underlying hardware resources (e.g memory available), the data itself (e.gthe data size), and, finally, the nature of workload (e.g the number of queries)[2]. Adatabase workload can be characterized based on the nature of requests processed beingtransactional or analytical (queries). Traditionally workloads were broadly classifiedinto two categories - Online Transaction Processing (OLTP) and Online AnalyticalProcessing (OLAP) workloads. However, today with ever changing nature of workloadsspecific to individual applications, another category has joined the league in recent years- Hybrid Transactional and Analytical Processing (HTAP). This hybrid workload housesthe higher transactional throughput expectations of an OLTP workload and the complexqueries of an OLAP workload.

Database management systems were habitually designed with a focus on OLTP orOLAP, to address business requirements which are either for higher I/O operations foruser-facing applications, or for faster analytics for business intelligence. However, withthe emergence of hybrid workloads, the design focus has shifted largely towards HTAPdatabase management systems, that seek to combine in a single design support for boththe requirements. HTAP systems, are also commonly main-memory systems.

The literature sources that we have identified to be useful for understanding the natureof workloads, workload management and the design focus of database management arelisted as follows:

• Zhang, Mingyi, Patrick Martin, Wendy Powley and Jianjun Chen “Workloadmanagement in database management systems: A Taxonomy”(2017) [4]: Providesan overview of workload management and presents a taxonomy for the classificationof workload management mechanisms deployed database management systems.

• Sen, Rathijit and Karthik Ramachandra. “Characterizing resource sensitivity ofdatabase workloads”(2018) [2]: Provides detailed analysis of different hardwareresource configuration’s impact on database system performance, over differentworkloads.

• Elnaffar, Said, Pat Martin, and Randy Horman. “Automatically classifyingdatabase workloads”(2002) [16]: Presents a classifier model to automaticallyidentify and characterize database workloads.

• Stonebraker, Michael and Lawrence A Rowe. “The design of postgres”(1986)[17]: Highlights the early history, design and development of PostgreSQL from itspredecessor Ingres.

Page 25: Empirical evaluation of state-of-the-art databases on

2.1. Literature Overview 7

• Boncz, Peter. “Monet; a next-generation DBMS kernel for query-intensive applica-tions”(2002) [18]: Presents an overview of the architectural design of MonetDB,and its query optimization mechanisms.

• Pavlo,Andrew, and Matthew Aslett. “What’s Really New with NewSQL?”(2016)[19]: Highlights design aspects for a generation of OLTP/HTAP, mostly in-memorysystems, which were called NewSQL database systems. This includes MemSQLand CockroachDB.

• Pezzini, Massimo, Donald Feinberg, Nigel Rayner and Roxane Edjlali “HybridTransaction/Analytical Processing Will Foster Opportunities for Dramatic BusinessInnovation”(2014) [14]: First coined the term Hybrid Transactional/AnalyticalProcessing (HTAP) and highlights the impact of HTAP systems on businesses.

• Ozcan, Fatma, Yuanyuan Tian and Pinar Tozun. “Hybrid Transactional/AnalyticalProcessing: A Survey”(2017) [20]: Provides an overview of the OLTP and OLAPdatabase systems and an in-depth analysis of systems offering HTAP functionalityalong with their architectures and trade-off mechanisms.

• Bohm, Alexander, Jens Dittrich,Niloy Mukherjee, Ippokratis Pandis and RajkumarSen. “Operational analytics data management systems”(2016) [21]: Early surveyon HTAP systems, describing academic prototypes.

• Giceva, Jana, and Mohammad Sadoghi. “Hybrid OLTP and OLAP.”(2019)[22]:Gives a definition of HTAP systems, and discusses their main aspects.

• Zhang, Hao, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang.“In-memory big data management and processing: A survey.” (2015) [23]: Com-prehensive survey on design aspects of in-memory database systems.

• Pavlo, Andrew, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin and Lin Ma,Prashanth Menon, Todd Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar,Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xianand Tieying Zhang. “Self-Driving Database Management Systems”(2017) [24]:Highlights the automated classification of workloads and according optimizationas column-oriented or row-oriented engine in Peloton.

• Arulraj, Joy, Andrew Pavlo and Prashanth Menon “Bridging the Archipelago be-tween Row-Stores and Column-Stores for Hybrid Workloads”(2016) [25]: Proposesa new in-memory DBMS architecture which supports hybrid workloads with dataoptimization on segments of data based upon its query access pattern.

• Kim, Kangnyeon, Tianzheng Wang, Ryan Johnson and Ippokratis Pandis. “ER-MIA: Fast Memory-Optimized Database System for Heterogeneous Workloads”(2016)[26]: Proposes a memory-optimized database system capable to process hybridOLTP and OLAP workloads which grants concurrency control with serializedsnapshot transaction isolation.

Page 26: Empirical evaluation of state-of-the-art databases on

8 2. Technical Background

• Wu, Yingjun, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo. “An empiricalevaluation of in-memory multi-version concurrency control ”(2017)[27]: Describesseveral design possibilities for multi-version concurrency control, a commonly usedmechanism. Authors compare systems like PostgreSQL and MemSQL in theirperformance over the TPC-C benchmark.

• Darfler, Benjamin. “CockroachDB: A Scalable, Geo-Replicated, TransactionalDatastore”(2014) [28]: Article presenting the HTAP functionality of CockroachDB,the open-source version of Google Spanner, alongside some key features.

2.1.2 Workload-Based Database Benchmarks

In the context of database systems, a benchmark is a set of designed operations executedover a designed database to evaluate a system’s performance. This set of operationsserve as a reference to evaluate and compare the performance of different systems, ordifferent configurations of the same system. Over the decades, benchmarks evaluatingdatabase systems have evolved variably pertaining to the change in both the databaseworkloads and the database management systems.

The Transaction Processing Performance Council (TPC) - a non-profit organization, hasover the decades developed many industry standard database and transaction processingbenchmarks [29]. Likewise the databases, benchmarks too were designed to individuallyaddress mainly the transaction processing and analytical workloads. TPC-C and TPC-Eare two of the most used benchmarks for comparing database systems for transactionprocessing capabilities. While TPC-H and TPC-DS are deployed as decision supportbenchmarks. However, with the emergence of database systems processing hybridworkloads, there was a legitimate requirement for benchmarks with mixed workloadsto evaluate state-of-the-art database systems. This has lead to the development ofbenchmarks like HTAPBench, TPC-Ch/CH-benChmark, etc, which are categorized intoHTAP systems benchmarks.

The literature resources which we deemed important to understand the elementalstructure of an OLTP workload benchmark, an OLAP workload benchmark and a hybridworkload benchmark, are the following:

• Dietrich,S. W., M. Brown, E. Cortes-Rello and S. Wunderlin. “A Practitioner’sIntroduction to Database Performance Benchmarks and Measurements”(1992)[30]: Provides an overview of database performance benchmarks and a pragmaticcomparison between transaction processing and decision support benchmarks atthe time of writing the paper.

• Difallah, Djellel Eddine, Andrew Pavlo, Carlo Curino and Philippe Cudre-Mauroux.“OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases”(2013)[31]: Presents OLTP-Bench - a transaction processing benchmark for relationaldatabases.

Page 27: Empirical evaluation of state-of-the-art databases on

2.1. Literature Overview 9

• Leutenegger, Scott T. and Daniel Dias. “A modeling study of the TPC-C bench-mark”(1993) [32]: Presents a detailed design and workload structure of the trans-action processing benchmark TPC-C.

• Chen, Shimin, Anastasia Ailamaki, Manos Athanassoulis, Phillip B. Gibbons, RyanJohnson, Ippokratis Pandis and Radu Stoica. “TPC-E vs. TPC-C: characterizingthe new TPC-E benchmark via an I/O comparison study”(2010) [33]: Provides anoverview of the new enhanced transaction processing benchmark TPC-E, and aprofound comparison with TPC-C.

• Nambiar, Raghunath and Meikel Poess. “TPC-H Analyzed: Hidden Messages andLessons Learned from an Influential Benchmark”(2013) [34]: Discuss in-depth theTPC-H benchmark and the issues from database design that this workload canhelp to study.

• HRUBARU, Ionut and Marin Fotache. “On the Performance of Three In-MemoryData Systems for On Line Analytical Processing”(2017) [35]: This paper presentsthe evaluation of three in-memory database systems including MemSQL for an-alytical workloads using TPC-H to analyze memory footprint, query executiontime and and data loading.

• Kaur, Karambir and Monika Sachdeva. “Performance evaluation of NewSQLdatabases”(2017) [36]: This paper presents a pragmatic evaluation of four NewSQLdatabase systems - CockroachDB, MemSQL, NuoDB and VoltDB and a classifica-tion of these database systems for transaction processing (OLTP) workloads inthe context of the exploding big data and it’s management.

• Bog, Anja, Hasso Plattner and Alexander Zeier. “A mixed transaction process-ing and operational reporting benchmark”(2011) [37]: This paper proposes abenchmark with integrated real-time or operational query reporting and transac-tion processing for the performance evaluation of database systems with mixedworkloads.

• Cole, Richard, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, StefanKrompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess,Kai-Uwe Sattler, Michael Seibold, Eric Simon, Florian Waas. “The mixed workloadCH-benCHmark”(2011) [38]: Proposes a new benchmark TPC-CH for hybridworkloads to produce comparable insights on HTAP database systems, as well asfor OLTP and OLAP database systems.

• Psaroudakis, Iraklis, Florian Wolf, Norman May, Thomas Neumann, AlexanderBohm, Anastasia Ailamaki and Kai-Uwe Sattler. “Scaling Up Mixed Workloads:A Battle of Data Freshness, Flexibility, and Scheduling”(2015) [3]: This paperpresents a performance evaluation of two state-of-the-art in-memory databasesystems - SAP HANA and HyPer, for mixed workload processing using CH-benCHmark. Authors highlight some design/configuration aspects that can havean incidence in the overall performance.

Page 28: Empirical evaluation of state-of-the-art databases on

10 2. Technical Background

• Coelho, Fabio, Joao Paulo, Ricardo Vilaca, Jose Pereira and Rui Oliveira. “HTAP-Bench: Hybrid Transactional and Analytical Processing Benchmark”(2017) [11]:Proposes a benchmark “HTAPBench” aimed at producing a unified metric fordatabase systems’ performance evaluation for hybrid workloads. This is thebenchmark used for the experiments within the scope of this thesis.

2.2 Workload-Specific Designs in Database Systems

In this section we provide an overview on the types of workloads, workload managementmechanisms and examples of database systems designed to address these workloads.We will also elaborate, in brief, the workload taxonomy proposed by Zhang et al [4]and the dimensions required to contemplate for performance evaluation of a databasemanagement system.

2.2.1 Database Workloads

A database workload is defined as a set of processing requests featuring common aspects.The type of workload can be categorized by the nature of requests received to beprocessed, which can be write operations (transactions), read-only operations (queries)or read-write operations.

2.2.1.1 Workloads Classification

Over the years database workloads have evolved and are ever changing today more thanever. Database workloads can thus be predominantly categorized into three types, asfollows:

• On-Line Transaction Processing (OLTP) workloads

OLTP workloads are marked by shorter interactive operations or transactions withreaction times in milli-seconds. These are predominantly designed for operationslike data updates, smaller queries, inserts and deletes over singular records withhigher concurrency. OLTP workloads are compliant to the ACID properties andOLTP database systems are specifically designed to back such workloads. Storagelayout of transactional or OLTP database systems is commonly row-oriented,with a focus on shorter transactions on a single, or limited to a few records. Acommonly used index structure in OLTP systems is the B-Tree, which enhancesthe systems concurrency by avoiding complete table scans and helping to havemore fine-grained locks.

• On-Line Analytical Processing (OLAP) workloads

Also termed as decision support workloads, OLAP workloads intend to extractmeaningful insights from large databases, by the deployment of analytical reportingqueries. The data size considered per query is bigger than for transactional queries,and it can usually peak up to terabytes of data. OLAP database systems are

Page 29: Empirical evaluation of state-of-the-art databases on

2.2. Workload-Specific Designs in Database Systems 11

designed to support queries comprising of scans over large amounts of data recordsor rows. Unlike OLTP workloads, OLAP workloads are mostly read-only. Thecommon storage layout for such database systems is usually a column-orientedstore. The major advantage of using a columnar storage is the optimized memoryfootprint obtained by higher compression rates, which are possible thanks to thesame data type in a column.

• Hybrid Transactional and Analytical Processing (HTAP) or hybrid workloads

It is also termed as many other annotations such as real-time operational analyticsor mixed workloads. This kind of workload comprises of query processing on thetransactional data, allowing businesses to grasp real-time analytics over the freshlywritten records in a database. The traditional approach of analytics on data storedin a warehouse used ETL processes for the transfer of data from transactionalmemory to the disk storage. These ETL process is renounced with HTAP databasesystems. Whilst there are many design choices to build a HTAP database systemused by different DBMS, a common approach is a column-oriented storage layoutover which a connected, updateable row-store is maintained. Analytics is acquiredover the less recent data maintained by the column-store, which is periodicallyupdated, and it is complemented with recent data from the row-store. This varieswith different transaction isolation levels configured during the design of suchsystems. Serializable snapshot is used in many of the state-of-the-art databasesystems offering HTAP functionality.

Appropriate design choices have a greater impact peculiar to the database workload inuse. Sen et al [2] presented resource sensitivity for all the three database workloads.Their findings are multi-dimensional, which considered the following key aspects forevaluation:

• The three types of workloads - OLTP, OLAP and HTAP.

• Varying data sizes - both fitting in memory and overloading the memory size.

• Hardware resources - main-memory, disk storage, number of cores, cache, etc,.

• Database configuration choices - column or raw-store, storage layout, index struc-tures, etc,.

Figure 2.1 depicts the spectrum of their experimental evaluation. Of all the character-izations, the aspects of interest for our work relies with the sensitivity to intra-queryparallelism, number of cores and memory capacity.

The key-takeaways from their analysis in regards to different workloads are:

Page 30: Empirical evaluation of state-of-the-art databases on

12 2. Technical Background

Figure 2.1: Multi-dimensional DBMS performance analysis [2]

• OLTP workload

For transactional throughput, and improvement in the higher number of coresobtain greater impact than cache size beyond a threshold cache size. However,threading of hyper magnitude can in some cases produce unfavourable performanceand thus requires appropriate configurations peculiar to individual workload.

• OLAP workload

Queries utilize main-memory as an temporary storage for their intermediate results.Sen et al [2] revealed through their experiments he fact that by designating appro-priate memory sizes, higher concurrency can be achieved with more concurrentqueries.

• HTAP workload

Parallelism is influenced greatly by the scale factor and complexity of the query.While some queries performed better with a serial execution plan for lower scalefactors, others remained insensitive. Higher scale factors depicted improved resultsfor parallel execution plans. The more complex queries here are typical fromOLAP workloads, while the simpler queries pertain to OLTP workloads, withsmaller execution times and data scan ranges.

Apart from the work of Sen et al, Psaroudakis et al [3] have also studied systems forHTAP workloads, examining the impact of scaling-up the hybrid workload by increasing

Page 31: Empirical evaluation of state-of-the-art databases on

2.2. Workload-Specific Designs in Database Systems 13

the number of concurrent clients. Figure 2.2 depicts the design implications thatthe authors conclude from their experiments, considering three prime elements - datafreshness for analytical operations, flexibility of transactions and scheduling.

Figure 2.2: Envisioned graphs of system performance with mixed workloads [3]

• When performing analytical operations over data residing in data warehouses, theOLAP performance will be high, albeit at the cost of a low level of data freshness.Hence, the authors note that as the freshness requirements increase, the analyticperformance will tend to be deteriorated.

• Transactional flexibility points out to the database system’s regulations on OLTPworkloads, a restrictive design might force all queries to be executed based on astored procedure, a permissive one might perform query compilation. Authorsargue (but do not evaluate this), that increasing the restrictions leads to highertransactional analytic performance.

• Scheduling refers to workload management mechanisms and the trade-offs betweenOLTP and OLAP workloads relying upon priorities determined by the workloadmanagement mechanism employed [3]. Authors observe a house pattern, whereincreasing the number of concurrent OLAP clients will likely decrease the perfor-mance of a fixed number of OLTP clients at some point. How good the databaseisolates both workloads, is also called workload isolation.

2.2.1.2 Workload Management

Workload management is a course of actions vital for the efficient utilization of systemresources and achieving best performance for the desired operations. Considering thedifferent aspects that determine a good performance for different mixes of workload[2, 3],how the database manages the workloads can be expected to be the crucial factor foroverall good performance.

Workload management actions include monitoring and managing the workload andcontrolling the work flow for the requests over a database system. Database systemsfocused on processing different workloads employ different workload management mecha-nisms. The choice of workload management techniques used in a database system design

Page 32: Empirical evaluation of state-of-the-art databases on

14 2. Technical Background

determine the weights for OLTP and OLAP workloads. Thus, it makes the selection ofmanagement technique crucial to achieve the performance objectives of any databasesystem.

Zhang et al [4] proposed a taxonomy to understand the organization of workloadmanagement techniques employed by modern database management systems. Thistaxonomy is helpful to classify all of the current workload management techniques andevaluate their performance. To gain an in-depth understanding of the design choices withthe available workload management techniques, we will briefly discuss this taxonomyand the impact and use of different techniques.

Figure 2.3: Workload management techniques: proposed taxonomy [4]

Figure 2.3 depicts the proposed taxonomy by Zhang et al [4]. The workload managementtechniques are principally categorized into four classes which are further branched outinto sub-classes relying on their specific mechanisms. We will elaborate each of theclasses and their sub-classes to learn their specific characteristics.

• Workload Characterization

Workload characterization can be outlined as the set of operations to identifythe essence of a workload and characterizing it to a particular class (OLTP,OLAP or HTAP). This is an important aspect for any workload managementtechnique in order to design the plan for system resource allocation. This class isfurther divided into two sub-classes - static and dynamic characterization. Staticcharacterization performs its operations to identify the workload class prior to thearrival of requests at the database systems and allocates the systems resourcesto the identified workloads. Dynamic characterization defines the workload classupon the arrival of requests at the database system or server. This mechanism islargely still in the research literature and proposes the use of machine learningalgorithms to build a workload classifier [16].

• Admission Control

In the context of OLTP database systems, admission control is a measure ofmaximum number of clients permissible to connect to the database. Higher

Page 33: Empirical evaluation of state-of-the-art databases on

2.2. Workload-Specific Designs in Database Systems 15

number of clients improve the system throughput up to the threshold of theunderlying resource contention and declining thereafter resulting in the systemcrash in some cases if the number of requests exceeds way beyond the systemcapabilities. In the context of HTAP workloads, admission control is responsiblefor an additional measure other than controlling the number of client connected.It makes sure that the systems accomplish the desired performance for all thearriving requests. This class is further branched out into two sub-classes - threshold-based and prediction-based admission control. Threshold-based mechanism is thetraditional approach of setting a parameter threshold and accept requests whichdo not breach this threshold and is currently used in majority of modern databasesystems. Prediction-based mechanism is majorly still under research and implies tobuild a prediction model for system performance for a query prior to its execution.These techniques are developed using machine-learning approaches and rely onsystem performance metrics[39].

• Scheduling

Conventionally, database systems employ scheduling - which is monitoring andestablishing the order in which the arriving requests are processed to achieve thedesired performance targets. This class is further sub-divided into two sub-classes -queue management and query restructuring. Queue management techniques sched-ule the requests relying upon the workload properties like resource requirements,priorities, etc,. Query restructuring is identical to queue management with anaddition of query decomposing where a query is disintegrated into smaller queries.These smaller queries are then placed in the queue to schedule their processingtimes [40].

• Execution Control

Execution control can be described as the measure to control the impact ofrequest processing on the performance of other concurrent requests being processed.This is mainly controlled by the transaction isolation levels in the databasesystem. This class is sub-divided into three sub-classes - query reprioritization,query cancellation and request suspension. Query reprioritization is the dynamicadaptation of query priority during its execution. Query cancellation is thesystem resource re-allocation which was previously being used by the runningquery now terminated. This terminated query is then re-queued to be successfullyprocessed albeit depending upon the database system’s control policy [41]. Requestsuspension is the mechanism of temporary termination of a query being processed,secure storage of the intermediate results and then resuming the query processinglater [42].

To the best of our knowledge, most database systems, including the ones that westudy, do not offer advanced aspects regarding the automation of workload management,comprising, for example, workload characterization or admission control based on theknowledge of the workload.

Page 34: Empirical evaluation of state-of-the-art databases on

16 2. Technical Background

Having described the classification of workloads and the modern techniques of workloadmanagement, we will now present an overview of key concepts on database systemspertaining to these different types of workloads.

2.2.2 Online Transaction Processing (OLTP) Database Sys-tems

In this section we present some example database systems designed to process trans-actional database workloads. We provide an overview of a few important featurespertaining to these database systems, which can be useful to reason about the systemperformance observed through our experiments using HTAPBench.

2.2.2.1 Key Characteristic Features - OLTP

OLTP database systems are characterized by distinct features designed for the prompttransactional operations. Some key features pertaining to these league of databasesystems are listed below:

• Storage layout

Primarily, row-oriented storage layout is used by database systems with focus ontransactional operations. Row-oriented database systems stored in main-memorywiden the space for real-time operational analytics.

• Transaction Isolation Transaction isolation levels regulate the aspects of transac-tions between other users of the system. In elementary terms, transaction isolationensures that any transaction which is being processed is perceived as the soleoccupant of the database resources. Improper isolation between transactions maylead to wrong query responses generally termed as read phenomena. The two mainread phenomena are listed as follows:

– Dirty read: It occurs as a collision of two transactions trying to access thesame data. If a transaction reads a data record which is being processed byanother transaction that has not yet committed.

– Non-repeatable read: This is another situation of transactions trying toconcurrently access the same data records. When a transaction gains accessand reads the same data record twice, if another concurrent transaction mighthave performed an update on that record, the original transaction might reada different value, even if logically the value should have remained the same,from the perspective of the transaction.

Four transaction isolation levels defined by the ANSI SQL standard:

– Read Uncommitted : It is the lowest level of transaction isolation and vulnera-ble to dirty reads.

Page 35: Empirical evaluation of state-of-the-art databases on

2.2. Workload-Specific Designs in Database Systems 17

– Read Committed: Read-queries with this transaction isolation level can flashthrough only the data committed prior to the query initiation. Uncommitteddata by concurrent transactions is unavailable for read operations. It preventsdirty read as no other operations can be performed on the data locked bythe read statements.

– Repeatable Read: It can only witness the data committed prior to the com-mencement of the transaction and prevents from the prohibited phenomenalike dirty read and non-repeatable read. It applies the read-locks and write-locks simultaneously on the data records.

– Serializable: Strictest transaction isolation level and imitates serial transactionexecution. This causes frequent transaction failures and applications mustretry transactions for successful transaction commits. This might hinder theoverall OLTP transaction execution speed. [43]

Figure 2.4: Isolation levels v/s read phenomena [5]

Figure 2.4 depicts the influence of different transaction isolation levels over con-currency and consistency. The isolation levels are listed in the increasing order ofstrictness alongside the read phenomena likely to ensue at each of these levels.

• Multiversion Concurrency Control (MVCC)

One feature of utmost interest in transaction processing is data consistency overconcurrent requests. This is maintained internally by a concurrency controlmodel. The most common model in today’s main-memory systems is multiversionconcurrency control. This method allows SQL queries at any given time to accessa snapshot of data (tuple version some time ago), irrespective of the present stateof the underlying data. Thus, the MVCC model prevents SQL statements from

Page 36: Empirical evaluation of state-of-the-art databases on

18 2. Technical Background

reading inconsistent data on the same row, on which concurrent transactionsare performing updates. Traditional database systems use locking techniqueslike 2-phase locking where conflicts between a reader and writer would resultin blocking. This can be used in MVCC. In addition MVCC can be done withoptimistic concurrency control or timestamp ordering. Authors[27] show that otheraspects in the configuration of MVCC, such as the kind of storage done for theversions, the garbage collection process for old version, and the relation betweenMVCC and indexes can also have an influence on performance.

2.2.2.2 PostgreSQL

PostgreSQL is an open source object-relational database system (ORDBMS) 1. Post-greSQl is an evolved product of Ingres project (open source SQL relational databasemanagement system) at the University of California, Berkeley [17]. Its first versionwas released in 1989 and its SQL language psql was designed and developed in 1984.This DBMS software is maintained operational by developers and volunteers across theglobe under the PostgreSQL Global Development Group [44]. PostgreSQL nowadayssupports many of the modern features like complex queries, updatable views, transac-tional integrity and multiversion concurrency control. Additional features other thanthe generalized characteristic features are drafted below.

• Query Parallelism

PostgreSQL supports a feature known as parallel query which can make queriesleverage several CPUs. But not all the queries can benefit from this feature due tocurrent development limitations. Nonetheless, queries which flash through a largeamount of data only to return a few rows, can benefit predominantly from thisfeature and can speed up query processing multiple times. The query optimizerinspects the quickest execution strategy for any given query. If it determinesparallel execution strategy to be the quicker one than the serial execution, it willcreate a query plan. This comprises of Gather Merge node which holds one childplan which in turn is a part of the entire query plan. Either the entire query oronly a portion of that plan is executed in parallel depending upon the positionof the Gather Merge node on the query plan tree. If this node resides at the topof the query plan tree, then all query portions are processed in parallel, i.e., theentire query. If the node resides on a branch of the query plan tree, then only theportion of the query pertaining to that branch is processed in parallel.

• Parallel scans

Every parallel portion (child plan) of the query plan is recognized by the queryoptimizer as a partial plan, acquiring subset of the desired complete output. Thisis realized by parallel-aware scan. The addition of parallel scans into PostgreSQLbrings this system closer to having some OLAP-major features. As of the time

1The PostgreSQL Global Development Group, Documentation PostgreSQL 10.7

Page 37: Empirical evaluation of state-of-the-art databases on

2.2. Workload-Specific Designs in Database Systems 19

during the writing of this thesis, PostgreSQL currently supports the followingparallel-aware scans:

– Parallel sequential scans: The table’s blocks are divided and sequentiallydistributed among the partial plans (processes).

– Parallel bitmap heap scan: This is similar in operation to parallel sequentialscans, but the division to table’s blocks are carried out by a process chosenas leader. This leader process performs an index scan to build a bitmapsegregating the required blocks of the table for the query plan in consideration.

– Parallel index scan: There is no division of table’s block before hand hereand the processes take turn to scan data from the blocks 2.

• PostgreSQL grants higher concurrency to multiple database system users whileemploying the MV2PL (multi version 2-phase locking) protocol. This is one ofthe many versions of the multi version concurrency control (MVCC) described inSection 2.2.2.1.

• By default PostgreSQL employs a B-tree index type and extends support to otherindex types like Hash, BRIN, etc,.

• Similarly, Parallel joins and parallel aggregation are supported by PostgreSQL.Parallel aggregations can achieved with Gather Merge node, with every individualparallel process producing partial results acting as partial aggregate node and thecumulative aggregation realized towards the end of the query plan.

• Out of the four levels of transaction isolation defined by SQL standard, PostgreSQLsupports three of them - Read committed, Serializable and Repeatable read withthe first being its default isolation3. Throughout our experiments, HTAPBenchtests are conducted against two of these three isolation levels -Read-Committedand Serializable with variations in the target tps.

2.2.2.3 MySQL

MySQL is an open source Relational Database Management System with its first versionreleased to the world in 1995. The initial version was derived as msql using a low levellanguage which was eventually taken off with conclusive remarks of being too slow[45]. It is currently owned and maintained by the Oracle corporation. MySQL extendssupport to state-of-the-art partitioning methodologies, cross-platform support and multi-versioned storage engines. MySQL comprises of multiple native storage engines likeInnoDB (default), MyISAM, Memory, CSV, Merge, Archive, etc. HTAPBench tests areprincipally conducted over the default storage engine InnoDB and a single demonstrativetest over MyISAM to observe the discrepancies in system performance.

2The PostgreSQL Global Development Group, Documentation PostgreSQL 10.73The PostgreSQL Global Development Group, Documentation PostgreSQL 10.7

Page 38: Empirical evaluation of state-of-the-art databases on

20 2. Technical Background

Figure 2.5: MySQL design architecture [6]

Figure 2.5 depicts a general logical architecture of MySQL database management systemcomprising of three layers. The top most layer is the client application connection layerfollowed by the second layer of processing unit. The bottom layer depicts few of theunderlying storage engines supported by MySQL.

• InnoDB

It is a general purpose storage engine compliant to ACID properties and is thedefault storage engine in MySQL. It employs row-level locking for transactionsand multi-version read consistency with non-locking consistent reads (MV2PL)to enhance multi-user concurrency [45]. Data security is realized by the InnoDBcrash recovery mechanism and its buffer pool is devised to speed-up transactionswith higher memory allocation. A crucial feature in the context of query executionis the deployment of a primary key index acting as an optimizer to minimize I/Ofor primary key lookup. This is foreseen by a feature adaptive hash index built onthe existing B-tree table index. The default indexing type used by InnoDB is aB-tree.

With fitting workloads and ample memory for the buffer pool, InnoDB canreplicate in-memory database-like performance on systems. InnoDB’s multi versionconcurrency control mechanism is identical to the one discussed in Section 2.2.2.1.InnoDB holds onto the information about old row versions to support concurrencyand transaction rollbacks. MySQL replication features like replication with differentengines as master and slave are inherited by InnoDB. All of the top four transactionisolation levels Read-Committed, Serializable, Read-Uncommitted and Repeatable-Read (default in InnoDB) are supported which are elaborated in Section 2.2.2.1.

• MyISAM

While it is not one of the most used storage engines now, it was the sole storageengine supported by MySQL until version series 3. It is derived from the older

Page 39: Empirical evaluation of state-of-the-art databases on

2.2. Workload-Specific Designs in Database Systems 21

and now deprecated ISAM storage engine which was used in the first version ofMySQL released. With the MySQL version used in the experiments, MyISAMstorage engine no longer provides for data partitioning or data migration withpartitioned MyISAM tables. However, MyISAM is still deployed in rare suitedcases with necessary useful extensions which is the reason of it’s considerationwithin the scope of this thesis. We have conducted one HTAPBench test withthe storage engine set to MyISAM which will help us understand it’s performancediscrepancies with InnoDB.

2.2.3 Online Analytical Processing (OLAP) Database Systems

In this section, we present a database system MonetDB, designed to process analyticaldatabase workloads. We provide an overview of a few important features pertainingto OLAP database systems and MonetDB which will help us reason, in later chapters,about the system performance observed through our experiments using HTAPBench.

2.2.3.1 Key Characteristic Features - OLAP

Database systems designed for OLAP operations exhibit specific features which arenon-profitable to OLTP systems. A pure OLAP database systems has the followingdistinct design features:

• Storage layout

A column-oriented storage type is most befitting for analytical queries to scanlarge chunks of data in the tables. All the entries in a column are made adjacentby serializing them. This is the key difference to row-oriented storage whichmakes entries of a record adjacent. The adjacent residence of a column values in acolumn-oriented storage enables data scans at utmost higher rates than a OLTPdatabase system using row-oriented storage [46].

• Vectorization

Vectorization can be described as the process of writing light weight algorithms forcolumn-oriented query processing. This is discrepant from the traditional queryprocessing on data records (rows) - processing one tuple at a time.

Vectorization is the mechanism to process queries on entire columns at a time,allowing speedy scans over large amounts of data. Implementing such algorithmsfor a complete query plan is difficult as scanning an individual record might attimes be necessary depending upon the query statement. However, a major portionof the query plan can be designed to be operated on columns [7].

Page 40: Empirical evaluation of state-of-the-art databases on

22 2. Technical Background

Figure 2.6: Row-oriented v/s vectorized column-oriented query processing [7]

2.2.3.2 MonetDB

MonetDB is an open source column oriented database management system designedto address efficient performance for complex queries over large databases. MonetDBis an ideal choice for data analytics, data mining, information retrieval (text), etc. Itwas the first database system to use vectorized query processing for boosting up queryprocessing. Its initial development in its current form was named as just Monet andwas developed at the University of Amsterdam in 2002[18]. Its first licensed version wasmade available to the world in 2004 and later released into open source domain startingMonetDB version 4 (MonetDB4).

With the first release of the used version in 2015, MonetDB grants support for the readonly horizontal partitioning of data also known as data sharding and persistent indices4. The key architectural design choices for MonetDB are enlist below.

• Storage Model

Its design is principally focused on warehouse environments and bulk data pro-cessing and not adapted to soaring size of transaction processing. It provides acomplete SQL interface alongside multi-level acid properties. Its storage modelincludes unique vertical fragmentation of relational tables and storing columns inseparate tables (OID, value). This methodology is known as Binary AssociationTable (BAT) [47]. MonetDB accepts inputs as BATs or scalar values and confidesto low level relational algebra derived to be BAT algebra. It supports query

4MonetDB Jul2015 released, Release Notes

Page 41: Empirical evaluation of state-of-the-art databases on

2.2. Workload-Specific Designs in Database Systems 23

parallelism similar to PostgreSQL’s query plan tree. The child plans here arereplaced by BAT which store the intermediate results of an SQL query. The finalresult of a query is aggregation of the results from the collection of BAT [47].

• Execution Model

MonetDB’s kernel is programmed in it’s own low level language known as MonetDBAssembly Language (MAL) where any relational algebra operator conforms andmaps to MonetDB Assembly Language’s instruction. Complex queries expressionsare not accepted by the instruction as parameters with no degree of freedom.Nonetheless, it breaks and imparts complex expressions into sequential BAToperators which individually perform rather simple operations on separate columns.This is termed as bulk processing where a complex expression is handed over toBAT algebra operators in bits and pieces to act upon individually with high dataaccess granted by tight for-loops [47].

• System Architecture / Software Stack

Three software layers in MonetDB’s query execution strategy: Front end (top layer),Back end (middle layer) and Kernel (bottom layer). The top layer caters language,a user level heuristic data model and a query language parser. The middle layercomprises of optimizer modules embedded withing the MAL optimizers frameworkfocused on the MAL plans. The bottom layer consists of a MAL interpreterwhich in its initial design was located in the middle layer. This comprises of alibrary which embeds highly optimized implementations of the MAL instructions(operators) [47].

2.2.4 Hybrid Transactional and Analytical Processing (HTAP)Database Systems

We present in this section some key aspects on state-of-the-art hybrid transactional andanalytical (HTAP) database systems capable of processing mixed OLTP and OLAPworkloads. We will drive through their design architecture and key features supportingHTAP functionality.

HTAP database systems are alternatively described as OLTP database systems withinduced capabilities of business analytical query processing. HTAP systems are designedto process mixed workloads and thus inherit the characteristic features of both OLTP andOLAP database systems with reasonable exceptions. The design goal of these systemsis to draw better OLAP performance while still sustaining the desired transactionalthroughput. HTAP systems still prevent the features like concurrency, workload isolationand query parallelism while processing mixed workloads.

2.2.4.1 MemSQL

MemSQL is a distributed, in-memory, relational database management system with itsfirst public version launched in 2013. It is a NewSQL database management system

Page 42: Empirical evaluation of state-of-the-art databases on

24 2. Technical Background

designed to handle mixed workloads to provide real-time analytics over transactionaldata. MemSQL features a two-layer storage structure - on-disk column store and in-memory row store enabling high concurrency and real-time analytics. Its data ingestionmethodology termed as MemSQL Pipeline allows sizable data to surge into the databaseengine at a soaring throughput 5.

• Architecture

A distributed architecture provides the functionality of partitioned workloads andrequests processing. MemSQL deploys nodes within two leagues - aggregator(master) and leaf nodes.

– Aggregator nodes are the masters of any given cluster responsible for receivingSQL queries, route the fragmented partial queries across the leaf nodes andaggregate the acquired results back to the worker. Meta-data is located atthe aggregator nodes.

– Leaf nodes receive the queries which are partitioned and distributed by themaster nodes and process these queries over the data stored. Data distributedacross the leaf nodes is hash-partitioned by the master node. This data isfurther partitioned at the leaf nodes where each partition acts as an individualdatabase [35].

• HTAP Functionality

High scaling feature of the system is realized by MemSQL’s support for horizontalpartitioning of the data (sharding) into smaller data parts (data shards) andits distribution among the nodes of a cluster. Any addition or suspension ofnodes is dealt with automatic redistribution of the data shards across the nodes.While the disk storage (column-store) is best suited for analytical workloads withlarger data sets, its in-memory storage (row-store) grants real-time analytics ontransactional workloads [21]. Stale data is periodically flushed into the disk storageto accommodate most recent data into the main-memory. One major leveragewith this dual-layer storage mechanism is the ability to grant the integration ofthe data from the two storage engines into a single query. High concurrency issustained and its distributed query optimizer enables efficient system resourceutilization. Query plans are cached for the speedy execution of ensuing querieswith identical structure. Alongside lock-free data structures, MemSQL with itsMulti-Version Concurrency Control (MVCC) as described in Section 2.2.2.1 grantshigher data availability 6.

• Durability and Replication

5https://docs.memsql.com/introduction/latest/how-memsql-works/6https://docs.memsql.com/introduction/latest/how-memsql-works/#high-performance-for-oltp-

and-olap-workloads

Page 43: Empirical evaluation of state-of-the-art databases on

2.2. Workload-Specific Designs in Database Systems 25

MemSQL employs discrepant mechanisms for column-store and row-store toenhance its durability. For its main-memory row-store, endorsed or committedtransactions are flushed to the disk storage as log records and revived as databasesnapshots periodically. A cluster prevents node failure by allowing the nodes toshare replicated data. Column-store being stored onto the disk storage is not asvulnerable to failures as the in-memory row store and its durability is overlookedby the used index structures, skiplists.

2.2.4.2 CockroachDB

CockroachDB is a highly scalable, resilient transactional datastore with exemplary fea-tures like geo-distributed SQL, geo-replication, geo-partitioning, etc,. CockroachDB wasfirst launched as an open source project in 2014 and supported by seasoned contributorssince then. The server architecture is designed to as what the developers claim almostimpossible to break-in. Although it is a transactional datastore, the high data availabilityalongside the geo-distributed SQL enables moderate analtyical performance. However,it favours transactional workload over analytical workload by employing the policy of“transactions first” [48].

Figure 2.7: CockroachDB architecture [8]

Figure 2.7 depicts the core architecture of CockroachDB. It is a multi-layered architectureand we list down and elaborate every layer below.

Page 44: Empirical evaluation of state-of-the-art databases on

26 2. Technical Background

• SQL layer: The SQL interface represents the top most layer in the architectureand it is the entry door for clients to databases. The SQL statements fro clientsare transformed into a key-value (KV) paired data.

• Transactional layer: This layer is responsible for the most important key-featureof CockroachDB - consistency. It extends complete support for ACID propertiesto the multiple key-value (KV) pair entries.

• Distribution layer: Access to all the data in the cluster is granted to every nodeby a monolithic store map of the KV data. This map includes meta ranges whichdetail the data location in the cluster and the table data (meta-table) of the cluster.

• Replication layer: This layer is responsible for consistent data (KV) replicationacross nodes in a synchronized manner.

• Storage layer: This layer oversees the I/O or read and write operations on thedisk-storage [48].

Concurrency is ensured by the deployment of the multi-version concurrency (MVCC)model. CockroachDB is designed to be compatible with PostgreSQL and inherits a fewof its features. Queries are processed in a distributed manner across nodes, achievinghigh query parallelism. This feature is similar to the query parallelism mechanismreported by PostgreSQL as described in Section 2.2.2.2. HTAP functionality is as of yetonly partially supported by CockroachDB and research for extended support for totalHTAP functionality is underway. The default transaction isolation level in CockroachDBis Serializable and, in the current versions, it promotes all the other isolation levels toserializable automatically.

Having presented a thorough elaboration on the classification of workloads and intro-ducing database systems peculiar to this workload classes, we provide an overview ofworkload-specific database benchmarks in the next section.

2.3 Workload-based Database Benchmarks

Benchmarks serve as a common reference to evaluate system performance. Described asa set of operations performed on database systems, database benchmarks help evaluatethe transactional throughput, application performance over large data volumes, costestimates, etc,. Since the start of computing, benchmarks are broadly categorized ashardware and software benchmarks. Hardware benchmarks are deployed to test thecapabilities and efficiency of the hardware the database is functioning upon. Softwarebenchmarks had a challenging evolution with perpetual inventions of modern techniquesin database systems and applications. With numerous vendors developing state-of-the-art database systems, it is crucial to evaluate these systems on prevalent grounds. Today,industry standard benchmarks are developed and maintained by various non-profitorganizations like the Transaction Processing Performance Council (TPC) [29]. This

Page 45: Empirical evaluation of state-of-the-art databases on

2.3. Workload-based Database Benchmarks 27

council has over the years been credited with the design and development of manyindustry standard benchmarks like TPC-C, TPC-E, TPC-H, TPC-A, etc, for applicationsand database systems evaluation. In the context of database systems, evaluating thetransactional throughput and analytical query reporting efficiency of the systems coverup the evaluation spectrum of any potential business requirement. We will put forthan overview of the key database benchmarks used widely pertaining to different targetevaluations.

2.3.1 OLTP Benchmark - TPC-C

OLTP benchmarks or transactional processing benchmarks are designed to address theevaluation questions related to transactional workload processing. Two of the mostwidely used OLTP benchmarks are TPC-C and TPC-E. While the TPC-E is one of thelatest industry standard benchmarks, TPC-C has been around for decades and is stillextensively used directly or indirectly as a development platform for new benchmarks[33]. We will elaborate the design architecture of TPC-C which is a core design choicein the HTAPBench which is used throughout our benchmarking campaign within scopeof this thesis.

TPC-C

TPC-C was first approved by the Transaction Processing Performance Council (TPC) in1992 as an enhanced alternative to the now obsolete benchmark - TPC-A. The designedset of operations in TPC-C integrate a blend of five distinct concurrent transactiontypes and a complex database. It was designed to address the industry standard realdatabase workload of a wholesale supplier conferred to the council. The database schemaof this workload is depicted in Figure 2.8 comprising of nine distinct database tables.With the whole supplier’s workload in focus, TPC-C serves all the business segmentswhich involve management of retail or wholesale sales and purchases, i.e., almost everybusiness unit on this planet.

Figure 2.8: TPC-C database schema [9]

Page 46: Empirical evaluation of state-of-the-art databases on

28 2. Technical Background

Set of Transactions / Workload:

The workload is comprised of a set of operations typical to any real-world trade model.

• New-order: Entry of a new order data record upon request arrival from thecustomer.

• Payment: Updates the data record for customer balance to list the payment.

• Delivery: status update after delivering the order.

• Order-status: Fetches the status from the data record for a customer’s latest order.

• Stock-level: Retrieves the warehouse inventory status post the trade.

As can be observed from the above listed transactions, they represent a healthy blendof different types of transactions. The first three transactions perform read-writeoperations while the remainder two are read-only operations on a single data record.These transactions are configured with different weights to match to the real-worldscenario. 45% of all the transactions include new-order transactions while paymentand delivery contribute for 43% and 4% respectively. The two read-only transactionsorder-status and stock-level are configured at 4% each [9].

Metric:

Benchmarks produce a final result in the form of a metric as a common reference to allthe tests. TPC-C benchmark tests are interpreted by it’s evaluation metric - transactionsper minute (tpmC ) which reveals the number of new-order transactions a databasesystems can execute in any given time interval grouped by minutes. Another portion ofthe metric involves evaluation of the cost or price-performance ratio which is calculatedas the sum of all the costs including hardware and software for a time span of 3 yearsover the performance metric - tpmC.

2.3.2 OLAP Benchmark - TPC-H

OLAP benchmarks or analytical processing benchmarks are designed to address theevaluation questions peculiar to OLAP workloads to assist decision support systems.OLAP benchmarks are framed with workloads to operate larger volumes of data, usuallytera-data, unlike the TPC-C read-only operations querying singular data records. Twoof the widely used analytical benchmarks are TPC-H and TPC-DS. Likewise with theOLTP benchmarks, we will learn the design architecture of just the TPC-H benchmark,which is a crucial portion in the core design of HTAPBench.

TPC-H

TPC-H was first approved and developed in 1999 by the TPC council to evaluatethe performance decision support database systems. The designed set of queries in

Page 47: Empirical evaluation of state-of-the-art databases on

2.3. Workload-based Database Benchmarks 29

Figure 2.9: TPC-H database schema [10]

TPC-H are widely relevant to the industry standard operations. The database schemaand the populated data are identical to the real-world data to resemble the industryoperations. The TPC-H benchmark emphasizes the database systems which are requiredto scan large volumes of data, scrutinize complex queries and resolve demanding businessquestions. The set of operations in TPC-H include complex ad-hoc queries alongsidedata modification operations performed concurrently. Figure 2.9 depicts the databaseschema of the OLAP workload employed by TPC-H comprising of 8 database tables.

Set of operations/Workload:

The workload integrates a set of 22 ad-hoc queries alongside 2 refresh functions. Thedecision support system processes this queries with parallelism with the assistance ofit’s query optimizer. The principal task of the 2 refresh functions is to perform updateoperations while flushing out the stale data. All of the 22 queries are sequentiallyenlisted below [9]:

• Pricing Summary Report (Q1): Returns the billed amount of business jointly withthe shipped and returned quantities.

• Minimum Cost Supplier (Q2): Filters down a supplier in a given region to order aparticular part.

• Shipping Priority (Q3): Returns top 10 highest valued undelivered orders.

• Order Priority Checking (Q4): Assesses the order priority system based on customersatisfaction.

• Local Supplier Volume (Q5): Returns local suppliers revenue volume.

Page 48: Empirical evaluation of state-of-the-art databases on

30 2. Technical Background

• Forecasting Revenue Change (Q6): Predicts added revenue benefits with certainconditions.

• Volume Shipping (Q7): Returns the calculated value of goods shipped betweennations.

• National Market Share (Q8): Returns a comparative response about a part’schange in market share within a given region of a particular nation.

• Product Type Profit Measure (Q9): Evaluates the profit revenue for a line of partsgrouped by supplier, nation and date.

• Returned Item Reporting (Q10): Classifies customers unsatisfied with the product.

• Important Stock Identification (Q11): Count the crucial stock peculiar to suppliers.

• Hipping Modes and Order Priority (Q12): Analyzed the impact of cost-effectiveshipment modes on the delivery punctuality.

• Customer Distribution Query (Q13): Returns relationssip between customers andthe magnitude of their orders.

• Promotion Effect Query (Q14): Audits customer response to promotion campaigns.

• Top Supplier Query (Q15): Characterizes top suppliers to extend special recogni-tion.

• Parts/Supplier Relationship (Q16): Returns the number of suppliers with thepotential to supply required parts.

• Small-Quantity-Order Revenue (Q17): Evaluated annual average loss in revenuewith the condition of unfulfilled small scale orders.

• Large Volume Customer (Q18): Establishes customers relying upon the quantityof order.

• Discounted Revenue (Q19): Evaluates gross revenue peculiar to parts offered withdiscounted price.

• Potential Part Promotion (Q20): Acts like recommender engine for potentialcandidates.

• Suppliers Who Kept Orders Waiting (Q21): Selects suppliers with records ofdelayed shipments.

• Global Sales Opportunity (Q22): Identifies regions with potential customers [9].

Page 49: Empirical evaluation of state-of-the-art databases on

2.3. Workload-based Database Benchmarks 31

Metric:

TPC-H records a performance metric to evaluate the registered performance. This metricis termed as Composite Query-per-Hour Performance Metric (QphH ) which echoes thedatabase system’s performance with query processing. The price-performance ratio isevaluated in the similar manner as with the metric for TPC-C. An additional index fordata size in recorded with metric revealing the size of the data over which the querieswere processed.

2.3.3 Hybrid Transactional/Analytical Processing (HTAP) Bench-marks

OLTP and OLAP benchmarks evaluate the database system performance preciselywith the industry relevant data, transactions and queries. These benchmarks areindustry standards for decades now. However, with the emergence of database systemsoperating on mixed workloads, there is a legitimate business demand for the designand development of a benchmark with mixed workloads. Up until now, the OLTP andOLAP benchmarks are deployed over modern HTAP database systems to analyze theirtransactional and analytical capabilities. For instance, Ionut Hrubaru et al [35] presenteda performance evaluation of three in-memory database systems with the capability tohandle mixed workloads using TPC-H benchmark. They reported their evaluations withquery processing times and the TPC-H metric QphH. However, in such evaluations thereis a crucial aspect being neglected which is the impact of processing OLAP workload forinstance on the OLTP workload being processed on the same database system.

Businesses today rely heavily on the real-time analytics and thus raise the demand barfor HTAP database systems. To filter down the best possible choice of database systemfor their business operations it is mandatory to evaluate these HTAP systems againstthe desire performance objectives. This has triggered the design and development ofbenchmarks operating with mixed workloads and reporting an unified metric for theOLTP and OLAP workload processing performances registered by the SUT. We willdiscuss the design architecture of two such HTAP benchmarks - Ch-benCHmark andHTAPBench.

2.3.3.1 TPC-Ch / CH-benChmark

Gartner [14] characterized mixed workloads as OLTP application’s analytical functional-ity comprising of four principal components - continuous data loading, large numbers ofstandard reports, batch data loading, and unpredictable random ad-hoc query users.Ch-benCHmark proposed by Richard Cole [38] models the initial three of these fourcomponents and the ad-hoc queries are modelled to the limit identical to the TPCbenchmarks.

CH-benCHmark is designed as an integration of TPC-C and TPC-H benchmarksprocessing the OLTP and OLAP workloads from these benchmarks cumulatively. Theinitial terminology of this design was TPC-CH, which later with editions was renamed

Page 50: Empirical evaluation of state-of-the-art databases on

32 2. Technical Background

as CH-benCHmark. Figure 2.10 depicts the composed database schema employed byCH-benChmark.

Figure 2.10: CH-benCHmark database schema [10]

Workload:

The database schema cumulatively encloses 12 tables - 9 TPC-C and 3 of the TPC-H data- Supplier, Nation and Region. All the five TPC-C transactions described in Section 2.3.1- New-order, Payment, Delivery, Order-status and Stock-level are integrated into CH-benCHmark unaltered to assess the system’s transactional throughput. These operationsare performed on the 9 TPC-C tables without bothering the remainder 3 tables. Thenumber of warehouses and terminals utilized relies upon the scale factor of the databeing loaded and is automated.

For the analytical capabilities assessment, Ch-benCHmark employs 22 queries same asthe one used in TPC-H, albeit with transformations in the context of the extended TPC-Cschema. These queries are still relevant to industry standards as the underlying databaseschema represents real-world data. TPC-C transactions are processed continuously athigh rates updating the database leading to the exclusion of the 2 refresh functions usedin TPC-H. This frequent data updates have intense impact on the analytical workload asthe data size is every varying due to the transaction processing workload. The workloadcan thus be transactions and queries only or a combination of both.

The number of these OLTP clients and decision support (OLAP) streams connected tothe system at any given time define the mixed workload composition. The two workloadsare stored in relative isolation configured before benchmark deployment.

Metric:

The performance metric largely remains the same for transactional throughput andanalytical query processing. OLTP throughput is measure as TPC-C’s metric tpmC

Page 51: Empirical evaluation of state-of-the-art databases on

2.3. Workload-based Database Benchmarks 33

while the TPC-H metric QphH measures the query processing capabilities. These twometrics are inversely proportional to each other. Higher transactional throughput implies,at increasing data sizes the query processing is deteriorate as the amount of data to bescanned is increasing constantly. It also depends on other crucial factors peculiar to thedatabase system like - the storage layout, indexing, transaction isolation level, queryparallelism mechanism, etc.

2.3.3.2 HTAPBench

Coelho et al [11] proposed a new benchmark to evaluate modern HTAP database systemswith mixed transactional and analytical workloads - hybrid transactional and analyticalprocessing benchmark HTAPBench. It’s design architecture is similar to that of the CH-benCHmark integrating TPC-C and TPC-H benchmarks. However, a Client Balancer isintroduced in the core design which works on a feedback control mechanism to securethe transactional throughput within tolerable threshold. This is crucial as the industrystandard HTAP systems tend to register higher OLTP throughput while simultaneouslyprocessing analytical queries.

Figure 2.11 depicts the core design architecture of HTAPBench with the Client Balanceroccupying the top most position.

Figure 2.11: HTAPBench architecture [11]

Client Balancer:

The Client Balancer launches 1 OLAP client every minute until the OLTP throughputfalls below the set threshold. The feedback control mechanism continuously reports backthe registered throughput to the Client Balancer, which then releases 1 OLAP client ifthe OLTP throughput is sustained above the set threshold. The feedback mechanism

Page 52: Empirical evaluation of state-of-the-art databases on

34 2. Technical Background

is depicted in Equation 2.1 which includes a proportional (KP )and a gain adjustment(KI). The ∆t is 60 seconds, which is the time interval for the Client Balancer to waituntil the release of next OLAP client.

output = KP∆tps + KI

∫∆t (2.1)

Figure 2.12 depicts the execution cycle of HTAPBench. During the data load andwarm-up stage only the OLTP clients are active. After time ∆t, the first OLAP clientis launched and the system starts processing mixed workloads.

Figure 2.12: HTAPBench execution cycle [11]

Workload:

The database schema is exactly the same as CH-benCHmark with all the 9 tables of theTPC-C benchmark and the 3 TPC-H tables - Nation, Region and Supplier. However,the remainder of the TPC-H entities are integrated in the TPC-C workload in an non-invasive manner. All of the 5 TPC-C transactions remain unaltered. The dynamic querygenerator in the core architecture of HTAPBench, generates queries on the changingdatabase workloads. This is a first such approach to generate queries on the on theupdated database which ensures the analytical operations are carried over the mostrecent data. The OLTP processing is majorly influenced by the target tps (transactionsper second) set in the configuration file. To achieve the desired performance objectivesset by the target tps, optimal number of warehouses and maximum allowed clients isrequired to be configured. The TPC-C specifications are inferred to gauge these crucialparameters [11].

target (tpmC) = target (tps) × 60 × % New Order

100(2.2)

# clients =target(tmpC)

1.286(2.3)

# warehouses =# clients

10(2.4)

Page 53: Empirical evaluation of state-of-the-art databases on

2.3. Workload-based Database Benchmarks 35

Metric:

The previous approaches allowed the OLTP and OLAP executions to expand over timeto achieve the desired throughput objectives. This non-regulated growth in data sizecaused by the transactions eventually diminishes the OLAP performance. HTAPBenchclipped this research gap and introduced a regulated transactional execution. This leadsto a sustained and pre-determined database growth. The need to normalize the OLAPresults is eliminated as the rate of new data occurrence remains constant and predictablemaking the pathway for an unified metric hurdle free.

QpHpW =QphH

#OLAPworkers@tpmC (2.5)

The proposed unified metric is depicted in Equation 2.5. It’s a computed metric withusing the metrics of TPC-C (tpmC ) and TPC-H (QphH ) and the number of OLAPworkers released by the client balancer throughout the test execution time. The unifiedmetric QpHpW or “Queries of type H per Hour per Worker”” can be described asthe number of queries execution by an individual OLAP worker while sustaining theregistered tpmC. It is thus a measure of OLAP worker’s efficiency while still sustainingthe target tps.

We represent the initial results acquired by Coelho et al [11] to gather initial informationon how to interpret the acquired results.

OLTP SUT

The experiments over an OLTP system under test (SUT) are conducted on a serverwith an Intel Xeon E5-2670 v3 CPU with 24 physical cores, running Ubuntu 12.04LTS. HTAPBench was deployed over an OLTP row-oriented engine with the followingconfigurations:

• target tps : 100

• error margin: 20% (default)

These configurations approximate to the following as per the density extraction mecha-nism employed within HTAPBench:

• active OLTP clients : 2099

• warehouses : 100

• data: 117GB

Page 54: Empirical evaluation of state-of-the-art databases on

36 2. Technical Background

The results plotted in Figure 2.13 is an average of 5 independent runs with executiontime of 60 minutes with the same configurations. The linear preface joining the plottedpoints represents the OLTP throughput while the stair-case like structure depicts thelaunch of OLAP clients. The OLTP system under test (SUT) was able to achievethe target tps in the very first minute of the test and sustain almost through out theexecution time. The lower tps boundary with 20% error margin was surpassed after50 minutes since the start of the test, pushing the system into saturation The ClientBalancer was able to launch 50 OLAP workers prior to the system saturation.

Figure 2.13: Initial acquired results over OLTP SUT [11]

The SUT sustained 756 tpmC and 7 QphH. With 50 OLAP workers launched prior tothe SUT saturated, the QphH in each OLAP worker equals 0.14. The unified metric ofHTAPBench, QpHpW (”Queries of type H per Hour per Worker” [11]) in this test climbsto 0.14 @ 756 tpmC. Even with the higher number of OLAP workers launched, QphHper worker held low. Higher number of OLAP workers launched is directly proportionalto the aspect of how long the SUT sustains the target tps, irrespective of the efficiencyof OLAP workers with query execution.

OLAP SUT

For the tests over an OLAP SUT, the system configurations of OLTP experimentsare reused. HTAPBench was deployed over an OLAP column-oriented engine withadditional configuration. The results obtained are an average of 5 independent testswith 60 minutes execution time.

Page 55: Empirical evaluation of state-of-the-art databases on

2.3. Workload-based Database Benchmarks 37

Figure 2.14: Initial acquired results over OLAP SUT [11]

The results plotted in Figure 2.14 depict the same first minute behaviour of realizing thetarget tps as with the OLTP SUT. However, the SUT sustained the OLTP throughputonly until end of the 4Th minute of the execution time. Client Balancer was able tolaunch 4 OLAP workers before the system saturation. The number of OLAP workersreleased is very low as compared to the OLTP SUT, where the system was able tosustain it’s OLTP throughput almost until the end of the execution time concedingClient Balancer to launch 50 OLAP workers. This behavior of the OLAP SUT can bereasoned by it’s column-oriented design, which is not meant to address OLTP activity.

The SUT was able to register the TPC-C metric with 217 tpmC and TPC-H metricof 123 QpH. With 4 active OLAP workers, their unified metric QpHpW, evaluated to30.75 @ 217 tpmC. The OLTP throughput is weak as compared to OLTP SUT, whichregistered a throughput of 756 tpmC. The OLAP metric registered a peek of 123 QphHwhile OLTP SUT registered merely 7 QpH.

The number of OLAP clients released was higher with OLTP SUT and it is a measureof how long the SUT sustains the target system performance with OLTP workload.The registered QpHpW is an assessment scale of efficient OLAP performance. With50 OLAP workers, the OLTP SUT was able to register just 0.14 QpHpW. This valuewas 30.75 for the OLAP SUT with just 4 active OLAP clients. Clearly, the efficiency ofOLAP clients is better as compared to the first SUT.

HTAP SUT

The database system deployed used distributed architecture. The cluster was configuredto hold a total of 10 nodes, 1 master node with distribution system meta data and theremaining 9 with database storage and query processing responsibilities. Each node inthe cluster was deployed on systems with the following configurations:

• Processor: Intel i3-2100-3.1GHz (64 bit)

Page 56: Empirical evaluation of state-of-the-art databases on

38 2. Technical Background

• Cores: 2 physical (twice as many virtual cores)

• Main-memory: 8 Giga Bytes of RAM

• Disk-storage: 1 SATA II

• Operating System: Ubuntu 12.04 LTS

• Ethernet Protocol: Gigabit Ethernet network [11]

Figure 2.15: Initial acquired results over HTAP SUT [11]

The results from the plot in Figure 2.15 depict the SUT was able to achieve the targettps in the first minute of the test and sustained it until the end of 12Th minute. Oncesaturated, the OLTP throughput gradually decreased until the end of the execution time.The final metric aggregated at 14.14@530 tpmC with 12 OLAP worker released by theClient Balancer. OLTP throughput registered held at 530 tpmC while sustaining 169QphH. These results are defining the behaviour of the SUT precisely in the estimatedspace between the observed results for OLTP and OLAP SUT. The registered OLTPthroughput is lower than the one with OLTP SUT and higher than that of OLAPSUT. The SUT was able to sustain highest recorded QphH, however the measure ofsuccess with query execution relies on QpHpW which was observed to be moderateas compared to the results of OLTP and OLAP SUT. The hybrid SUT enhanced theTPC-H metric with speedy query executions while tolerating a reasonable trade-off withOLTP throughput.

Page 57: Empirical evaluation of state-of-the-art databases on

2.4. Summary 39

SUT #OLAP QphH QpHpW @ tpmC

OLTP 50 7 0.14 @ 756

OLAP 4 123 30.75 @ 217

Hybrid 12 169 14.14 @ 530

Table 2.1: Overview of the initial published results [11]

Figure 2.16: Quadrant plot - HTAPBench unified metric

As depicted in Figure 2.16, the difference in behavior of various SUT can be identifiedupon their location in the HTAP functionality spectrum. OLTP SUT behaviour aselaborated earlier in Figure 2.13 resides low with the number queries while OLAP SUTmastered the number of queries but remained weak with the OLTP throughput. Thehybrid SUT lies approximately near the center of the plot reasoning it’s performancewith mixed workloads, real-time analytics and reasonable trade-off with the OLTPthroughput.

Unfortunately authors do not evaluate scale factors, nor configurable parameters. Fur-thermore, authors do not report precise information on the systems tested. Thereforewe seek to fulfill these gaps with our research.

2.4 Summary

We conclude this chapter by listing some key takeaways:

• Database workload is set of operations to be processed on a database system.Categorized by the nature of operations, workloads are divided into 3 main cate-

Page 58: Empirical evaluation of state-of-the-art databases on

40 2. Technical Background

gories - transactional (OLTP), analytical (OLAP) and mixed or hybrid workloads(HTAP).

• Workload management is a series of actions to process the arrived requests byefficiently utilizing the systems resources and achieving the target performanceobjectives. We learned the taxonomy to classify workload management mechanismsbased on their distinct characteristics.

• OLTP database systems have some distinct system designs like row-oriented storagelayout for low latency write operations, and multi version concurrency control togrant higher concurrency for multiple users.

• OLAP database systems employ column-oriented storage engines, and vectorizationfor prompt query processing on entire columns, unlike OLTP systems where queriesprocess on one data record at a time.

• HTAP database systems induce the features of OLTP and OLAP database systemsto the best of composition to process queries over transactional data while stillsustaining higher transactional throughput.

• Database benchmarks are set of fixed operations (transactions and queries) whichevaluate the system performance by producing performance metric. The widelyused benchmark for OLTP workloads evaluates a system’s transactional through-put with the metric New-order transactions per minute - tpmC. For the OLAPworkloads, TPC-H evaluates the system’s OLAP performance by the metric -queries per hour (QphH).

• We introduced benchmarks operating on mixed workloads - CH-benCHmark andHTAPBench. CH-benCHmark integrates transactional and analytical benchmarks- TPC-C and TPC-H to process OLTP and OLAP workloads on a single systems.However, no unified metric is proposed to represent the system performance formixed workloads.

• HTAPBench integrates TPC-C and TPC-H just like CH-benCHmark. However, aClient Balancer is introduced into the core design to regulate the OLTP transactionsand release of new OLAP clients using a feedback control mechanism. Queries aregenerated for the changing workloads by the dynamic query generator. An unifiedmetric is proposed for the system performance with hybrid workloads - queriesper hour per worker (OLAP client) QpHpW @ tpmC.

Page 59: Empirical evaluation of state-of-the-art databases on

3. Prototypical implementation andresearch questions

In this chapter we present the prototypical implementation of the benchmark tests,research questions and the database systems considered within the range of our work.We organize this chapter with the following structure:

• Research Questions: We begin by presenting the research questions to beaddressed in this thesis in Section 3.1, for the different systems under test.

• Prototypical Implementation: In Section 3.2 we lay out the prototypicalimplementation and the evaluation prototype for the results interpretation.

• Database Systems: In Section 3.3 we list the database systems considered forevaluation using HTAPBench and their configurable parameters which influencethe performance of the system under test.

3.1 Research Questions

We put forth the following research questions to be answered during the course of ourwork:

1. OLTP: To what extent do configurable parameters, such as isolation levels orsystem-specific ones, influence or enhance the system performance and isolationamong clients, in an HTAP workload, for OLTP systems? What is the maximumQphH the system under test can register while sustaining a given target tps?

2. OLAP: To what extent do configurable parameters, such as isolation levels orsystem-specific ones, contribute or impact the system performance, isolation among

Page 60: Empirical evaluation of state-of-the-art databases on

42 3. Prototypical implementation and research questions

clients, and OLAP client’s efficiency, while serving HTAP workloads? Consideringthat the objective of this engine is not the support of OLTP activities, what is themaximum tpmC the system under test can register while sustaining a maximumnumber of active OLAP clients and QphH ?

3. HTAP Systems: To what extent do configurable parameters, such as isolationlevels or system-specific ones, contribute or impact the system performance, iso-lation among clients, in an HTAP workload? What is the maximum tpmC andQphH the SUT can sustain and how distant are these results from the OLTP andOLAP tests?

3.2 Evaluation prototype and work process

Figure 3.1: Prototypical implementation cycle

The prototypical implementation cycle is depicted in Figure 3.1 and includes the followingcourse of actions:

• Install the database selected for evaluation using HTAPBench.

• Examine the potential system parameters with possible impact on system perfor-mance withing the permissible range of the benchmark.

Page 61: Empirical evaluation of state-of-the-art databases on

3.2. Evaluation prototype and work process 43

• Frame the configuration files with different compositions of the chosen parameters.

• Launch the benchmark test for the first in-line configuration. Repeat the test forall the other configurations framed earlier.

• Collect the results and compute the unified metric.

• Evaluate the system performance using the unified metric in the spectrum ofHTAP functionality.

The work process flow for a test with HTAPBench is similar to the traditional transac-tional and analytical benchmarks TPC-C and TPC-H. The process follows a series ofactions as follows:

1. Build HTAPBench: HTAPBench is built on maven with a java distribution 1.8.

2. Configure HTAPBench: Begins with the database system installation andconfiguration with the required memory and max clients allowance for the ex-perimental setup. Next in line is to create a database with SQL-client interfaceadditional to the user/password (this user is later configured in the HTAPBenchto access the database created) with all the required privileges granted. Thedatabase schema is installed with the command line interface as shown in Fig-ure 3.2. Changes to the DDL file, to SQL dialects, or to the HTAPBench code-baseitself (for example, to handle the use of prepared statements for MonetDB), isincluded in this stage.

Figure 3.2: HTAPBench - Installing database schema [11]

3. Populate: Loading of data into the database can be done in two possible ways,via the CSV files or directly from HTAPBench. We have employed the lattermethod throughout our work as shown in Figure 3.3. The configured target TPStriggers the workload generation for different scaling factors. Database was freshlyloaded prior to each test.

Figure 3.3: HTAPBench - Data loading [11]

Page 62: Empirical evaluation of state-of-the-art databases on

44 3. Prototypical implementation and research questions

4. Run tests: After the successful data loading HTAPBench test is launched and thelogs are interpreted to tabulate and plot the results under different configurations.The window size to group the results was set to 120 seconds as can be seen inFigure 3.4. Errors captured during this phase usually required us to restart theprocess from the first stage.

Figure 3.4: HTAPBench - Launching the test [11]

3.2.1 HTAPBench Requirements

• Machine running HTAPBench must be installed with JAVA distribution (> 1.7)

• Additional maven dependencies to install JDBC driver for individual databasesystems to be tested. [11]

3.2.2 Experimental Setup

The tests were deployed on the system with the following configurations:

• 1 node device with 2 CPUs

• Processor: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz

• Cores: 18 cores per CPU

• Threads: 2 per core

• Cache: 50688 KB

• Main-memory: 376 GB

• HDD: 24 TB

• Operating System: Ubuntu 18.04.2 LTS

Since we have already described the HTAPBench benchmark, in Section 2.3.3.2, we referthe reader to that section for details on the evaluation benchmark used.

Page 63: Empirical evaluation of state-of-the-art databases on

3.3. Database Systems 45

3.3 Database Systems

Benchmark tests produce results variably with different database management systemsand engines. It is important to run the tests across different engines for a substantialevaluation of system performance, and a better understanding on how the benchmarkbehaves. The approach in this Thesis is to consider a transactional OLTP engine, ananalytical OLAP engine and HTAP engines for benchmark testing. Each engine willbe evaluated with different supported isolation levels and scaling factors apart fromsystem-relevant parameters. HTAPBench spans over three stages of execution which arepopulate stage, warm-up stage and execution stage where the data is freshly loaded forevery test to avoid over-writes [11]. The results comprise of the TPC-C’s tpmC andTPC-H’s QphH metrics in addition to HTAPBench’s unified metrics.

3.3.1 OLTP Database Systems

Two database systems inclined towards transactional OLTP system behaviour are testedand evaluated, which are:

• PostgreSQL: It is an open source object-relational database system 1. ThePostgreSQL version deployed for our experiments is 10.7 released on 2019-02-14

• MySQL: It is an open-source relational database management system. TheMySQL version used for our experiments and evaluation using HTAPBench is8.0.16 and was the latest release at the time when the experiments were conducted.

Same steps from Section 3.2 are followed sequentially. Tests are implemented for thesupported isolation levels, supported primary storage engines and target tps at 1,2,3and 4 evaluating to scaling factors 3,5,7 and 9 respectively. Run time of 60 minutes isconfigured for one test per transaction isolation level with the rest tests with differentconfigurations were scheduled for 15 minutes each. The test with execution time of 60minutes is only to illustrate the larger trend (which also remains for the 15 minutestests). Generally, leaving the tests for more time is important for systems where there isa high number of clients and the metrics require time for being averaged to their truevalues. After a trial and error process we were able to validate that for our experiments15 minutes were enough for a proper system evaluation.

For all our experiments we performed 4 repetitions for each test and we report theaverage values for throughput and for the metrics, across these executions.

3.3.2 OLAP Database Systems

The OLAP engine considered is - MonetDB, an open source column store databasesystem. For the experiments with HTAPBench, Monetdb5 is deployed with database

1The PostgreSQL Global Development Group, Documentation PostgreSQL 10.7

Page 64: Empirical evaluation of state-of-the-art databases on

46 3. Prototypical implementation and research questions

server toolkit version v1.1. HTAPBench was recompiled with added dependencies ofMonetdb JDBC driver. Tests are launched on an identical aspect with that of theOLTP engines and results are aggregated similarly. The focal points considered here arethe number of active OLAP streams and sustained tpmC and QphH by the system inanalogy to the OLTP engine.

3.3.3 HTAP Database Systems

The final leg of the thesis is deploying HTAPBench on HTAP database systems. Twosuch systems considered are:

• MemSQL: It is a scalable, distributed in-memory database management system.In our experiments we used MemSQL 6.7.

• CockroachDB: It is scalable, consistently-replicated, transactional datastore [28].For our experiments we used the version 19.1.2, of CockroachDB.

HTAPBench is recompiled for each engine with the addition of JDBC driver dependenciesfor the respective engines. Benchmark tests are performed with supported isolationlevels and altering target tps. The comparison approach for results from HTAP enginesare against different references with that of the OLTP and OLAP engines.

3.4 Summary

Key-takeaways from this section are enlisted as follows:

• Research questions pertaining to OLTP, OLAP and HTAP database systems.

• Process flow for a test using HTAPBench following sequential procedure - build,configure, populate and run.

• Brief introduction to Database systems considered.

• Following chapters will present the results and their interpretations for thedatabases introduced.

Page 65: Empirical evaluation of state-of-the-art databases on

4. Evaluation of OLAP and OLTPDatabase Systems withHTAPBench

In this chapter, we present the results of our evaluation of OLTP and OLAP databasesystems with HTAPBench. The chapter is organized as follows:

• Research Questions: We recapitulate the research questions that we answer inthis chapter in Section 4.1.

• Results Interpretation - OLTP: We present the results for our experimentswith PostgreSQL and MySQL, considering variations in isolation levels, target tps,and, when pertinent, storage engines in Section 4.2.

• Results Interpretation - OLAP: In Section 4.3 we show our results fromexperiments with MonetDB, considering variations in isolation levels in .

4.1 Research Questions

Recapitulating the research questions pertaining to this chapter, the following researchquestions will be addressed with our experiments:

1. OLTP: To what extent do configurable parameters, such as isolation levels orsystem-specific ones, influence or enhance the system performance and isolationamong clients, in an HTAP workload, for OLTP systems? What is the maximumQphH the system under test can register while sustaining a given target tps?

Page 66: Empirical evaluation of state-of-the-art databases on

48 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

2. OLAP: To what extent do configurable parameters, such as isolation levels orsystem-specific ones, contribute or impact the system performance, isolation amongclients, and OLAP client’s efficiency, while serving HTAP workloads? Consideringthat the objective of this engine is not the support of OLTP activities, what is themaximum tpmC the system under test can register while sustaining a maximumnumber of active OLAP clients and QphH ?

In order to carry out this evaluation, the permissible changes to the benchmark configu-ration file are the number of warehouses, target tps, transaction isolation level (accordingto the ANSI standard) and the overall test execution time.

4.2 Results over OLTP Database Systems

In this section we discuss the results acquired from HTAPBench tests with the twoOLTP database systems considered - PostgreSQL and MySQL.

In doing tests with OLTP systems, we can evaluate how the systems behave for thetraditional TPC-C benchmark, while withstanding an increasing number of OLAPclients performing TPC-H queries, subject to a constraint of maintaining a target tps.In addition we can observe the QphH of the system, which reports the behavior ofclients with the TPC-H benchmark. Given their enhanced capabilities for sustainingOLTP workloads (e.g. Wu et al. show that PostgreSQL enables low latency scans,whereas MySQL with InnoDB achieves comparatively high throughput, in the TPC-Cbenchmark [27]), it would be insightful to better understand the trade-offs of thesesystems with an increasing number of concurrent OLAP queries. We perform such studyusing the HTAPBench benchmark.

4.2.1 Results Interpretation and Evaluation: PostgreSQL

HTAPBench tests were conducted over PostgreSQL with different system configurationsfor 2 of the supported transaction isolation levels and for varying target tps ’s of 1, 2, 3and 4, respectively.

One example of a configuration file is shown in Figure 4.1. The parameters for thedriver and the dburl need to be modified, as pertaining to the system under test (SUT).The user details need to be cross-referenced to the one created using the administrativeinterface of the database, with all the required privileges (root privileges).

The number of warehouses (35) and the maximum allowed OLAP workers (i.e., 10, anumber which is never reached by the systems under test), remain unaltered during thecourse of our work. This is because the Client Balancer component of the benchmarkutilizes as many OLAP workers and warehouses as derived by the target tps, whosehighest value set during the experiments is 4. Thus, these two parameters suffice thetest requirements and remain unchanged. It is important to observe the transactionweights to understand the workload essence. The initial five weights 45, 44, 4, 4 and 3

Page 67: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 49

refer to the 5 TPC-C transactions, and the remainder to the 22 TPC-H queries. Thesum of these weights, as determined by the benchmark, must remain 200. The scalingfactors for different target transactions per second are tabulated in Table 4.1.

Figure 4.1: System configurable parameters

target tps Scaling Factor

1 3

2 5

3 7

4 9

Table 4.1: Target tps vs. Scale Factor

The plotted results for transaction isolation level Read-Committed in Figure 4.2 charac-terize the behaviour of the SUT. This is the lowest of the isolation levels in terms ofconsistency, from the supported levels in PostgreSQL1, and the one that enables the

1Although PostgreSQL supports Read Uncommitted, it does so in a way that is not clearly distin-guishable from Read Committed. Similarly, though PostgreSQL supports a Repeatable Read isolationlevel, it did not work in our evaluations and hence we do not include it

Page 68: Empirical evaluation of state-of-the-art databases on

50 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(d)

Figure 4.2: PostgreSQL registered performance - Read-Committed : (a) target tps 1; (b)target tps 2; (c) target tps 3; and, (d) target tps4.

Page 69: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 51

highest concurrency. For this isolation level we report similar results to the previouswork of Coelho et al. [11], though it is not possible to ascertain this fully since theauthors do not report about the transaction isolation configuration in their study.

The SUT achieved the target tps in the very first minute for all configurations (but morenotably in the case of 1 tps), and it sustained its performance until the 7th minute,plunging into saturation thereafter, as seen in 4.2(a). Overall the Client Balancer wasable to launch 6 OLAP workers throughout the entire test. The linear interpolationconnecting the plotted points (in blue) depicts the TPC-C throughput all along theevaluation period, and the number of OLAP workers launched during the experiment isrepresented by the plotted line featuring a staircase pattern (in black). In comparisonto the author’s results, the number of active OLAP workers in our evaluation was verylow (6 in our case, as compared to 50 in their case). This is reasonable considering thebig difference between the experimental environments, and limited knowledge on thespecific systems tested. Regardless, the performance pattern is similar.

In the 60 minute execution for target tps1 , the SUT was able to sustain a throughputof 8.49 tpmC and 2.99 QphH. The unified metric QpHpW with 6 active OLAP workerstallies to 0.50 @ 8.49 tpmC, which explicitly corresponds to the performance expectationfrom an OLTP engine.

target tps #OLAP QphH QpHpW @ tpmC

1 6 2.99 0.50 @ 8.49

2 6 7.19 1.19 @ 27.18

3 6 19.79 3.29 @ 38.1

4 3 10.20 3.4 @ 55.73

Table 4.2: PostgreSQL unified metric: Read-Committed

From our results we observe the expected behavior that increasing the target tps willlimit the number of clients spawned, to an extent. This is the case since a tps of 1achieves 6 OLAP workers, but a tps of 4 achieves only 3.

The metrics for TPC-C, TPC-H and the unified metric for HTAPBench are tabulatedin Table 4.2. With increasing target tps, the observed tpmC the SUT sustained hikedgradually alongside the QphH in Table 4.2. Higher transaction executions should resultin lower queries executed of type H. However, increasing the target tps produces anupsurge in query executions. This can be understood from the configuration file depictedin Figure 4.1, where any increment in the target tps eventually scales up the weightsfor all TPC-C transactions and TPC-H queries altogether. With higher tps valuesnaturally we observe a higher tpmC, and a higher number of queries. We also note that

Page 70: Empirical evaluation of state-of-the-art databases on

52 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

with increasing tps, the TPC-H throughput will eventually be hindered by the longerexecution times (per query latency), leading to decreasing QphH. We observe this forthe case of tps 4.

The results for the transaction isolation level Serializable differ from the results acquiredfor the Read-Committed isolation level, as seen in 4.3(a) and Table 4.3. To understandthe difference in the behavior, with respect to the Read Committed isolation level,we need to recall PostgreSQL’s serializable isolation from Section 2.2.2.1. This is thestrictest of all the supported isolation levels by PostgreSQL which employs rather asequential execution. The lower tpmC observed in these experiments can be explaineddue to the frequent transaction failures and the corresponding transaction rollbacks.Similarly, for the lower number of OLAP clients launched.

It is interesting to note that the results are not entirely as expected, when consideringthe number of OLAP clients, since the clients oscillate between 2-3 when changing thetarget tps. We judge that these differences are not as high as to be considered important,so we do not draw conclusions from this.

It is interesting to note that while sustaining a lower tpmC and lower clients, the SUTwith this isolation level maintains a higher QphH of 12.56 through the test execution forthe base case, and increasing numbers for different target tps values. This confirms thehouse pattern expected, with improvements in the analytical performance by means ofreductions in the transactional performance, with a high number of queries available [3].

The unified metric QpHpW, sustained by the SUT evaluates to 4.18 @ 3.25 tpmC with3 OLAP workers launched prior to the system saturation.

Finally we also note that increasing the target tps improves the analytical performancemeasures, as expected given the scaling of the queries, but this also occurs to an extent,since after the tps of 3, the analytical performance has a small fall.

target tps #OLAP QphH QpHpW @ tpmC

1 3 12.56 4.18 @ 3.25

2 2 21.50 10.75 @ 4.85

3 2 28.18 14.09 @ 6.72

4 3 31.98 10.66 @ 6.68

Table 4.3: PostgreSQL unified metric: Serializable

At the point in time of writing this thesis, there is no metric or technique in HTAP-Bench to evaluate the data freshness actually extended by the SUT. This implies thatwhile changing the isolation level, the data available for TPC-H transactions remains

Page 71: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 53

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(d)

Figure 4.3: PostgreSQL registered performance - Serializable: (a) target tps 1; (b) targettps 2; (c) target tps 3; and, (d) target tps 4.

Page 72: Empirical evaluation of state-of-the-art databases on

54 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

unchanged. As a result the concurrency configurations only affect TPC-C transactions,but do not stall or cause rollbacks for the TPC-H queries. This fact also explains therelatively high analytic performance observed when increasing the isolation level. Hadthere been an interaction between the isolation configuration and the TPC-H queries,the analytic performance could have been expected to be lower.

With this we conclude our presentation on results for PostgreSQL. In the next sectionwe present our results of an alternative OLTP system: the MySQL database.

4.2.2 Results Interpretation and Evaluation: MySQL

The HTAPBench tests conducted with MySQL are identical to the ones with PostgreSQL,except for the inclusion of two additional transaction isolation levels supported by MySQL- Read-Uncommitted and Repeatable-Read. All of the four isolation levels are configuredwith target tps 1, 2, 3 and 4 and the results collected are plotted and tabulated forinterpretation. The experimental setup remains unchanged.

The data definition language (DDL) file provided by HTAPBench, and the OLAP queriesthemselves (the SQL syntax) required some alterations to match the underlying databasesystem’s support.

Figure 4.4: MySQL JDBC connection configuration

A snippet from the configuration file with the required changes performed is depictedin Figure 4.4. One of the crucial adjustments, as shown in the configuration, is to setthe server timezone bypassing the daylight saving times, which helps to include datawhich could possibly be lost otherwise. Scale factors persist the same as described inTable 4.1. Other steps to set up the experiments are followed sequentially as illustratedin Section 3.2.

Considering the different storage engines, we include in our evaluation tests using twoof them: MyISAM and InnoDB. InnoDB implements fine-grained transactions and isan improvement over the simpler MyISAM. As a result, it is worthwhile to study thisdifference, as illustrative of the transaction flexibility parameter described by Psaroudakiset al [3]. Our expectation is that MyISAM represents a more restrictive transactionalflexibility setup than does InnoDB. This, according to the aforementioned work [3],should result in a better transactional performance (and a lower analytic performance)for MyISAM. However, other aspects of the storage engine configuration, such as the

Page 73: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 55

different approaches for table compression, or the read-optimizations introduced forMyISAM, could also influence our observations. A total of 16 tests for InnoDB and 1for MyISAM were implemented.

4.2.2.1 Transaction Isolation: Read-Uncommitted

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 4.5: MySQL registered performance: Read-Uncommitted and targettps 1

target tps #OLAP QphH QpHpW @ tpmC

1 7 3.75 0.53 @ 8.97

2 5 4.79 0.95 @ 23.96

3 6 8.39 1.39 @ 35.69

4 6 2.79 0.46 @ 46.54

Table 4.4: MySQL unified metric: Read-Uncommitted

The first set of results pertain to tests configured with the transaction isolation levelRead-Uncommitted. This is the lowest isolation level, and we expect it to lead to ahigher tpmC, over alternative isolation levels. The plotted results for experiments withtarget tps 1 are depicted in Figure 4.5. The SUT achieved the target tps in the firstminute and sustained the same until the 8Th minute. The Client Balancer was able tolaunch 7 OLAP workers, one every minute before the system reached saturation. TheTPC-C and TPC-H metric are held at 8.87 tpmC and 3.75 QphH respectively. With 7active OLAP workers, the unified metric QpHpW, equals to 0.53 @ 8.97 tpmC.

Results for tests with target tps 2, 3 and 4 are shown in Figure 4.6 and all the metricsare tabulated in Table 4.4. Overall the number of OLAP clients achieved does not vary

Page 74: Empirical evaluation of state-of-the-art databases on

56 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

to a great extent across tests. We observe the same tendency of TPC-C and TPC-Hvalues increasing with growth in the target tps. We note that overall the TPC-H valuesare not as high as with PostgreSQL. We also note that when moving between values 3and 4 of tps, there is a decrease in the analytic performance, which can be explained bythe competition for resources from the TPC-C queries.

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 4.6: MySQL registered performance - Read-Uncommitted : (a) target tps 2; (b)target tps 3; and, (c) target tps 4.

Page 75: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 57

4.2.2.2 Transaction Isolation: Read-Committed

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 4.7: MySQL registered performance: Read-Committed and target tps 1

target tps #OLAP QphH QpHpW @ tpmC

1 5 1.59 0.32 @ 9.98

2 5 3.99 0.79 @ 23.27

3 5 3.39 0.67 @ 40.99

4 7 7.11 1.01 @ 51.73

Table 4.5: MySQL unified metric: Read-Committed

The second set of results consists of isolation level Read-Committed. The plot inFigure 4.7 depicts that the SUT hit the target tps in the very first minute and sustainedit until the 5Th minute before the SUT goes into saturation. Before saturating, theClient Balancer launched one OLAP worker every minute, raising the tally up to 5. Thisnumber of OLAP clients is generally consistent across changes in the target tps, with asmall difference at the highest value, where the system was able to spawn up to 7 clients.

The TPC-C metric stood at 9.98 tpmC and the TPC-H metric at 1.59 QphH. With 5OLAP launched, the unified metric QpHpW, evaluates to 0.32 @ 9.98 tpmC.

The plots for test results with target tps 2, 3 and 4 are depicted in Figure 4.8 andthe metrics are tabulated in Table 4.5. Highest tpmC observed stands at 51.73 withtarget tps 4. These results are overall similar to those from PostgreSQL, at the sameisolation level, in terms of the tpmC. This validates that both systems are comparably

Page 76: Empirical evaluation of state-of-the-art databases on

58 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

fit for supporting OLTP workloads with this isolation level. However, at most tps valuesPostgreSQL shows slightly better values for the concurrent support of TPC-H clients.

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20th

roughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 4.8: MySQL registered performance - Read-Committed : (a) target tps 2; (b)target tps 3; and, (c) target tps 4.

Page 77: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 59

We note that the number of clients spawned is smaller when compared to the lowerisolation level for MySQL. However, we also note that, contrary to the expectations,the TPC-H metrics are slightly higher for QphH, but similar for the unified metric ofQpHpW when comparing isolation levels.

Interestingly the results also show no large impacts of increasing the tps on the analyticperformance. Though the cause is not clear, these results seem to indicate a goodisolation between workloads in the SUT with this isolation level, as the tps increases.This is interesting to note, since, comparing with PostgreSQL, MySQL does not showeven a minor impact on the analytic metrics when increasing the tps.

4.2.2.3 Transaction Isolation: Repeatable-Read

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 4.9: MySQL registered performance: Repeatable-Read and targettps 1

target tps #OLAP QphH QpHpW @ tpmC

1 3 1.90 0.63 @ 8.66

2 6 5.19 0.86 @ 24.34

3 7 4.51 0.64 @ 38.42

4 4 5.19 1.29 @ 48.49

Table 4.6: MySQL unified metric: Repeatable-Read

The next set of results on MySQL comprises of tests for transaction isolation levelRepeatable-Read. This level moves-up, when compared to the former levels, in terms ofconsistency requirements.

Page 78: Empirical evaluation of state-of-the-art databases on

60 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 4.10: MySQL registered performance - Repeatable-Read : (a) target tps 2; (b)target tps 3; and, (c) target tps 4.

The plot in Figure 4.9 depicts that the SUT achieved the target tps in the very firstminute and sustained it until the 4Th minute before the SUT goes into saturation. Priorto the system breaking the 20% error margin threshold, the Client Balancer was ableto launch 3 OLAP workers only. This shows an expected reduction in the number ofpossible clients, when compared with less strict isolation levels at this target tps.The

Page 79: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 61

TPC-C metric stood at 8.66 tpmC and the TPC-H metric at 1.90 QphH. With 3 activeOLAP workers, the unified metric QpHpW, evaluates to 0.63 @ 8.66 tpmC.

Results for tests with target tps 2, 3 and 4 are plotted in Figure 4.10 and the metricsare tabulated in Table 4.6. We find, similar to former isolation levels, consistent tpmCvalues with increasing tps.

Unlike all previous cases studied, we see some bigger variations in the number of clientssupported across the target tps. For example, by increasing the tps from 1 to 2, we seethe number of OLAP clients duplicated, but when moving between 3 and 4, they almostreduce in half. These differences are difficult to explain, but we can attribute them tosubtle aspects pertaining to transaction roll-backs and the conditions when saturationwas reached.

We also note that, similar to the results for read committed, the TPC-H values do notdisplay a visible trend for being reduced as values of tps increase. This also suggeststhat studying how the system scales as the target tps increases, for this isolation levelcould provide good insights about the SUT.

The highest observed metric values are for the test configured with target tps 4. It isalso interesting to note, as an illustrative measure what occurs for the target tps of 3.Here we see clearly that a higher number of OLAP clients does not necessarily meanthat there is a high number of OLAP transactions actually processed (as evident by thefact that this does not achieve higher TPC-H metrics).

4.2.2.4 Transaction Isolation: Serializable

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 4.11: MySQL registered performance: Serializable and target tps 1

We conclude our presentation of MySQL-InnoDB results, by considering the highesttransaction isolation level, Serializable. Results with target tps 1 are plotted in Fig-ure 4.11. The SUT as expected reached the target tps in the very first minute since

Page 80: Empirical evaluation of state-of-the-art databases on

62 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

target tps #OLAP QphH QpHpW @ tpmC

1 1 2.53 2.53 @ 3.29

2 1 0.39 0.39 @ 10.52

3 1 0 0 @ 15.26

4 1 0 0 @ 20.45

Table 4.7: MySQL unified metric: Serializable

the launch of HTAPBench. However, it could not sustain the target tps only until thesecond minute and the system saturated. Prior to the system saturation, HTAPBenchwas able to launch one OLAP worker, which eventually turned out to be the only OLAPworker ever launched throughout the test.

For the target tps of 1, TPC-C and TPC-h metrics read 3.29 tpmC and 2.53 QphHrespectively. With just 1 active OLAP worker the unified metric QpHpW, equals 2.53 @3.29 tpmC. Results for tests with target tps 2, 3 and 4 show similar trends in Figure 4.12.

The results overall are consistent with our expectation that strictness of the isolationlevel will reduce the number of OLAP clients that are launched, since the tpmC isreduced due to more frequents aborts and rollbacks. A similar trend, but not as marked(in terms of only having 1 OLAP client) was observed in PostgreSQL, with their 2isolation levels.

While the TPC-C metric offered a little improvement with increasing the target tps,as expected, the TPC-H metric eventually dies off for target tps 3 and 4. This stopsthe trend that we observed for the previous isolation levels in MySQL-InnoDB, whereincreasing tps was not shown to affect to a strong extent the TPC-H workloads.

The poor completion rate of TPC-H queries observed as the target tps increases mightbe due to lock starvation for these queries. To be able to precisely identify this requiresfurther studies, considering better the internal metrics offered by the SUT.

It could be possible to study more subtle configuration details regarding concurrencycontrol, which could improve the completion rate of TPC-H queries (for example,disabling the autocommit mode or using consistent snapshot features).

Page 81: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 63

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 4.12: MySQL registered performance - Serializable: (a) target tps 2; (b) targettps 3; and, (c) target tps 4.

Page 82: Empirical evaluation of state-of-the-art databases on

64 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

We will now present the results with MyISAM storage engine and a brief interpretationon the same. The implementation was similar to that with InnoDB, except for theexplicit step to set the storage engine to MyISAM while creating the database tablesvia the DDL files.

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 4.13: MySQL registered performance: MyISAM engine

The results for MySQL with database storage engine is depicted in Figure 4.13. Clearlythe SUT performed distinctly weak, as compared to InnoDB but this is perhaps anunfair comparison to a state-of-the-art engine to an engine which nowadays receivesminimal support. Still this study depicts some merits of this benchmark to clearlydifferentiate performances of the two engines, which might otherwise not be well knownto database administrators.

The SUT sustained the target tps until the 3rd minute, allowing the Client Balancer tolaunch 2 OLAP workers. The TPC-C and TPC-H metrics stood at 2.65 tpmC and 0.29QphH respectively. Unified metric for HTAPBench QpHpW, evaluated to 0.145 @ 2.65.tpmC. These results contradict our expectation that the system would perform betterthan InnoDB, in the tpmC. As stated previously, other factors apart from the flexibilityof the transactions influence the evaluation.

With this, we conclude the discussion on the results of MySQL and of OLTP databasesystems.

4.2.3 Discussion

It is essential to portray a cumulative picture for all the results pertaining to individualdatabase systems, to account for their performance evaluation. We have thus producedquadrant plots for the unified metric with configurations for transaction isolation Read-Committed and target tps as 1. These plots interpret how close is a database system toHTAP functionality.

Page 83: Empirical evaluation of state-of-the-art databases on

4.2. Results over OLTP Database Systems 65

Figure 4.14 depicts the cumulative performance registered by PostgreSQL. Tests withboth the supported isolation levels and other configurations registered contradictoryperformances. However, with its design focus to process OLTP workloads, the metricleaned clearly towards the OLTP edge supporting minimal OLAP operations.

Figure 4.14: Unified metric quadrant plot: PostgreSQL

The quadrant plot of MySQL tests in Figure 4.15 is no different to that of PostgreSQL.Three of the four supported isolation levels exhibited exceptional performance withthe OLTP workload while processing minimal queries. However, test with configuredwith transaction isolation Serializable depicted higher efficiency with analytic queryprocessing similar to the observed performance of PostgreSQL for this isolation level,but prevailed inefficient with the transactional workload. The registered performance isprincipally inclined towards a pure OLTP database system as was expected.

Figure 4.15: Unified metric quadrant plot: MySQL

Page 84: Empirical evaluation of state-of-the-art databases on

66 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

If a choice were to be made between both systems, at this expected tps, PostgreSQLshould be chosen when a serializable isolation level is required overall, and MySQL withInnoDB when a Read-Committed level suffices. The goodness of PostgreSQL for a strictisolation level is mostly due to its higher TPC-H values, which are in part due to betterlatencies in query processing under transactional consistency scenarios. These results areconsistent with the observation that PostgreSQL can achieve lower query latency, but notnecessarily better TPC-C throughput than MySQL with InnoDB [27]. We expect that aswe increase the tps, the performance of PostgreSQL in TPC-C queries with this isolationlevel will not be improved over that of MySQL with InnoDB. Unfortunately the latterwould still not be able to manage well more TPC-H queries, unless special configurationsare used. Hence, though we might prefer MySQL with InnoDB for TPC-C processingwith this isolation level, for mixed workloads PostgreSQL seems like a comparativelybetter choice.

With this we conclude the results for OLTP database systems. In the next section, wepresent the test results acquired for an OLAP database system and discuss the observedperformance.

4.3 OLAP Database System

In this section, we discuss the results acquired from the experiments over the OLAPdatabase system - MonetDB, and we compare the findings with the results for the OLTPdatabase systems previously discussed in Section 4.2.

The configurable parameters remain unchanged with the OLAP database system undertest as well. Designed to be an analytical database engine, results are expected to becontradictory to that of the OLTP engines. We expect higher QphH, and are interestedto know whether the system achieves the target tps, if yes, then how long does thesystem sustain it before plunging into saturation. One parameter influencing the systembehaviour slightly more efficient towards either OLTP or OLAP is the isolation level.We have thus tested MonetDB with the supported isolation levels.

4.3.1 HTAPBench Test Results over MonetDB

HTAPBench experiments are conducted against MonetDB with configurations encom-passing different supported transaction isolation levels - Read-Uncommitted, Read-Committed and Serializable. The target tps is set to 1 throughout the experiments asbeing a columnar OLAP database engine, testing it against higher transactions persecond would yield poor results for OLTP throughput as observed in Figure 2.14, andwill not help in the evaluation of the system.

MonetDB5 was installed followed by the database and user creation with all the requiredprivileges. Configurations remain unchanged for the three tests conducted (one perisolation level). Additional JDBC maven dependencies are required to be included in theproject prior to its compilation. Configuration file is altered to include the JDBC and

Page 85: Empirical evaluation of state-of-the-art databases on

4.3. OLAP Database System 67

database URLs in order for the application to connect to the database server implicitlyas can be seen in Figure 4.16.

In addition, to run with this database, some issues needed to be fixed in the benchmarkitself, pertaining to the management of prepared statements in the presence of rollbacks.

Figure 4.16: MonetDB JDBC connection configuration

Isolation Level #OLAP QphH QpHpW @ tpmC

Read-Committed 1 5.29 5.29 @ 1.05

Serializable 2 3.29 1.64 @ 0.99

Read-Uncommitted 2 1.39 0.69 @ 1.13

Table 4.8: MonetDB unified metric

The results of experiments over MonetDB using HTAPBench are plotted in Figure 4.17for the supported isolation levels and the registered metrics are tabulated in Table 4.8.As depicted in 4.17(a) for isolation level Read-Committed, the SUT realizes the targettps in the very first minute but does not sustain it for long and the system saturates atthe end of the first minute. The Client Balancer releases one OLAP client prior to thesystem breakdown. This results resemble the trend of early saturation as depicted inthe developers’ initial results in Figure 2.14. The SUT sustained an OLTP throughputof 1.05 tpmC while the TPC-H metric held at 5.29 QpH. This might be foreseen as aweaker metric registered, however with just 1 OLAP client released this implies higherefficiency than those corresponding in the OLTP systems. The unified metric QpHpW,evaluates to 5.29 @ 1.13 tpmC. No tests are conducted with higher target tps as the focusof evaluating this OLAP database system is not to probe its transactional capabilities.

Page 86: Empirical evaluation of state-of-the-art databases on

68 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 4.17: MonetDB registered performance: (a) target Read-Committed ; (b) targetSerializable ; and, (c) target Read-Uncommitted .

Page 87: Empirical evaluation of state-of-the-art databases on

4.3. OLAP Database System 69

Results for Serializable and Read-Uncommitted show a similar tendency in their behaviorbut with lower registered QphH values. This is not reasonable given the transaction logfeature of MonetDB which makes only the committed transactions available to read.Though the other isolation levels are supported, this committed condition will rather slowdown the query execution. For instance with Serializable, the requests being serialized,queries are delayed due to the queuing times usually extended by higher latencies for acolumnar engine, unlike non-locking consistent reads where concurrency is preserved.The unified metric for transaction isolation levels Serializable and Read-Uncommittedsettled at 1.69 @ 1.13 tpmC and 0.69 @ 0.99 tpmC with 2 active OLAP workers in bothof the cases.

Overall, results show that a smaller number of clients actually leads to better TPC-Hmetrics. Results also show poor tpmC values, as expected since this system is notdesigned for TPC-C-like workloads, and an unclear influence of the isolation levels. Weexpected the weakest isolation level to lead to the poorest TPC-H performance, bycreating more possible TPC-C clients and a high tpmC and so was this was the case.We also expected the strongest isolation level to lead to the lowest tpmC (as observed),but also to the highest analytic performance. By the end, the best analytic performancewas achieved in the middle isolation level. This result is slightly surprising.

A clear conclusion can be derived from the results obtained for OLTP and OLAP databasesystems tested. While the OLTP SUTs registered higher transactional throughput, OLAPSUT remained low with transactions. OLTP SUT’s ability to sustain the target tpslonger allowed the client Balancer to release higher number of OLAP clients as comparedto the OLAP SUT which saturated early. But the efficiency of OLAP workers remainedhigher for the OLAP SUT portraying its design purpose while the OLTP SUTs renderedpoor efficiency of the OLAP workers released.

With this discussion, we conclude this section for OLAP SUT experiments and resultsinterpretation.

4.3.2 Discussion

We sketch a quadrant plot for the performance analysis of MonetDB for mixed workloadswith the unified metric evaluated from the test results. The quadrant plot for theMonetDB tests is depicted in Figure 4.18. While all the tests registered a similar limitedperformance with the OLTP workload, the unified metric QpHpW was the highestobserved number of queries processed per OLAP client in our study, as expected. Thisbehaviour validates its column-oriented design architecture for the speedy scan of largeamounts of data with the assistance of its multi-layered software stack responsible forthe query execution strategy, and strong clients. Our results also show the interestingfinding that the best trade-off performance is achieved in the read committed isolationlevel.

Page 88: Empirical evaluation of state-of-the-art databases on

70 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

Figure 4.18: Unified metric quadrant plot: MonetDB

With this, we conclude the results interpretation and performance evaluation of the soleOLAP database system deployed for our experiments.

4.4 Summary

The key takeaways from this chapter are listed below:

• The research questions addressed in this chapter for both OLTP and OLAP SUTare answered by the results acquired, which depicted the influence of differentisolation levels, of the target tps, and, when pertinent, of storage engine selections.

• We presented an overview of the implementation aspects necessary to evaluate thethree database systems studied in this chapter.

• We presented our results, plotted and tabulated and summarized them in quadrants,according to the unified metric.

• Results for PostgreSQL and MySQL were included as representatives of OLTPdatabase systems. Results resemble in behavior, with the initial results producedby the developers, assuring of the precise implementation of the experiments.

– In both systems, there is an impact of increasing the strictness of the isolationlevel.

– Stronger isolation levels lead to lower TPC-C throughput. This is to beexpected since there will be more transaction rollbacks and lock contention.Such differences are made more notable at high target tps. To illustrate this wecan compare that PostgreSQL achieves a tpmC of 55.73 at its lower isolationlevel, with a target tps of 4, and a tpmC of 6.68 at its highest isolation level

Page 89: Empirical evaluation of state-of-the-art databases on

4.4. Summary 71

with the same tps. On the same scenarios MySQL with InnoDB, achievestpmC values of 46.54 and 20.45 respectively. Just considering the TPC-Cvalues, these results confirm that MySQL can achieve a higher throughputfor TPC-C than PostgreSQL, as shown in previous studies[27].

– Stronger isolation levels actually benefit TPC-H queries that are scheduledconcurrent to TPC-C ones on some cases. For PostgreSQL a QphH of 10.20is improved to 31.98, as the isolation level is made stronger for a tps of4. This, however, does not occur fully for MySQL, which sees its QphHreach 0 at high tps and a serializable isolation level. These results areconsistent with the observation that PostgreSQL has a good query latencywhen managing transactionally consistent query processing[27], as comparedto MySQL. It should be possible to evaluate optimizations to the processingwith serializable isolation in MySQL (e.g. the impact of autocommit, or usingsnapshot isolation statements), but such aspects were outside the scope forour work.

– Regarding the impact of increasing the target tps, we observed that generallythis leads to improved tpmC and analytic query performance. With MySQL,at isolation levels that are relatively low, it is possible that increasing thetarget tps will produce a system that offers a good trade-off between the 2workloads. Hence, when the isolation level required is only Read-Committed,MySQL with InnoDB might be a good choice. Still, further studies arerequired.

– The use of alternative storage engines for MySQL (InnoDB against MyISAM)does not allow to study purely the impact of alternative offerings of transactionflexibility. Results only show a poor overall behaviour for MyISAM. Perhapsfuture work could study better the influence of alternative storage engines.

– If a choice were to be made between both systems, at the expected tps - 1,PostgreSQL should be chosen when a serializable isolation level is requiredoverall, and MySQL with InnoDB when a Read-Committed level is enough.

• We discussed the results acquired for MonetDB.

– OLAP systems are immediately distinguishable in their behavior over mixedworkloads in HTAPBench, since they are not able to spawn a large numberof clients. This is understandable since the clients are strong and resourceintensive.

– Overall, results show that a smaller number of clients actually leads to betterTPC-H metrics for MonetDB, when compared to OLTP systems. Resultsalso show poor tpmC values, as expected.

– Results show an unclear influence of the isolation levels. We expected theweakest isolation level to lead to the poorest TPC-H performance, by creatingmore possible TPC-C clients and a high tpmC. This was the case. We alsoexpected the strongest isolation level to lead to the lowest tpmC (as observed),

Page 90: Empirical evaluation of state-of-the-art databases on

72 4. Evaluation of OLAP and OLTP Database Systems with HTAPBench

but also to the highest analytic performance. By the end, the best analyticperformance was achieved in the middle isolation level. This result wasslightly surprising.

• Finally, from the studies with these systems we observe that although the numberof clients spawned in the HTAPBench benchmark seems to be generally related tothe TPC-H performance, this is not always the case, and hence the unified metricis a better indicator of the overall performance of the SUT than the number ofclients alone.

In the next chapter we present the results acquired for tests over HTAP database systems,and we discuss these results.

Page 91: Empirical evaluation of state-of-the-art databases on

5. Evaluation of HTAP DatabaseSystems with HTAPBench

In this chapter, we introduce our experiments with Hybrid Transactional and Ana-lytical Processing (HTAP) database systems. We give a brief overview of requiredimplementation details, and we interpret the results obtained when using HTAPBench.

We structure the chapter as follows:

• Research Questions: We begin by enumerating the research questions pertainingto this chapter in Section 5.1.

• Results Interpretation: In Section 5.2, we present our results from the exper-iments conducted, evaluate and answer to the research questions framed to beaddressed in this section.

5.1 Research questions

This section provides an overview of the research question within the scope of ourexploration to evaluate HTAP database systems with HTAPBench:

1. HTAP Systems: To what extent do configurable parameters, such as isolationlevels or system-specific ones, contribute or impact the system performance, iso-lation among clients, in an HTAP workload? What is the maximum tpmC andQphH the SUT can sustain and how distant are these results from the OLTP andOLAP tests?

Page 92: Empirical evaluation of state-of-the-art databases on

74 5. Evaluation of HTAP Database Systems with HTAPBench

5.2 Results Interpretation and Evaluation

In this section we interpret and discuss the results acquired for tests over the HTAPdatabase systems considered for our experiments - MemSQL and CockroachDB. Towardsthe end we provide a detailed comparison between the results obtained for all thedatabase systems tested.

5.2.1 Results over MemSQL

We begin by putting forth the implementation details for our experiments with MemSQL.The MemSQL version deployed for the experiments is 5.78. HTAPBench is built withminor alterations to the DDL statements to match the underlying aggregation frameworkof MemSQL. The remaining steps were followed sequentially as detailed in Section 3.2.MySQL JDBC supports connections for the MemSQL server and hence no additionaldependencies were required to be added.

The connection details are as depicted in Figure 5.1.

Figure 5.1: MemSQL JDBC connection configuration

We deployed a MemSQL cluster with 3 virtual nodes on the same machine with 1master-aggregator node and 2 leaf nodes. This is the minimum number of nodes toobserve the efficiency of any distributed database system.

Cumulative 17 tests are conducted with MemSQL database system, of which 16 testsare deployed with the database configurations mentioned above and 1 test with acolumn-store engine to depict the performance difference. Results are first grouped forconfigurations with different isolation levels.

Page 93: Empirical evaluation of state-of-the-art databases on

5.2. Results Interpretation and Evaluation 75

5.2.1.1 Transaction Isolation: Read-Uncommitted

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 5.2: MemSQL registered performance: Read-Uncommitted and target tps 1

target tps #OLAP QphH QpHpW @ tpmC

1 3 5.29 1.76 @ 8.52

2 4 8.79 2.19 @ 24.78

3 5 15.58 3.13 @ 36.45

4 4 27.58 6.89 @ 45.77

Table 5.1: MemSQL unified metric: Read-Uncommitted

We begin with the results for configurations with transaction isolation Read-Uncommitted.The plotted results in Figure 5.2 show the SUT achieved the required OLTP throughputand was able to sustain it until the threshold breakdown in the 4Th minute. The ClientBalancer was able to release 3 OLAP workers prior to system saturation in the 4Thminute. The SUT was able to registered 8.52 tpmC and 5.29 QphH. The unified metricQpHpW, with 3 active OLAP clients evaluates to 1.76 @ 8.52 tpmC.

In terms of the number of clients spawned, the results are different from those of OLAPor OLTP systems, reaching a comparably higher and lower (respectively), number ofclients. In fact a relation between either TPC-C or TPC-H performance based on thenumber of clients is hard to establish.

Results also show a comparatively higher tpmC, which is remarkably close to the one ofMySQL in the same configuration, through all target tps, but with higher performance

Page 94: Empirical evaluation of state-of-the-art databases on

76 5. Evaluation of HTAP Database Systems with HTAPBench

in the TPC-H workloads. These results amount, on the highest tps for this isolationlevel to a difference of 6.89 @ 45.77 for MemSQL, against a unified metric of 0.46 @46.54 for MySQL with InnoDB.

Results differ also from OLTP systems, by showing increasing performance of TPC-Cand TPC-H as the target tps increases.

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 5.3: MemSQL registered performance - Read-Uncommitted : (a) target tps 2; (b)target tps 3; and, (c) target tps 4.

Page 95: Empirical evaluation of state-of-the-art databases on

5.2. Results Interpretation and Evaluation 77

5.2.1.2 Transaction Isolation: Read-Committed

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 5.4: MemSQL registered performance: Read-Committed and target tps 1

target tps #OLAP QphH QpHpW @ tpmC

1 3 5.49 1.83 @ 8.27

2 6 17.18 2.86 @ 25.52

3 4 5.59 1.39 @ 33.28

4 5 6.79 1.35 @ 49.84

Table 5.2: MemSQL unified metric: Read-Committed

Figure 5.4 depicts the trend of realizing the target tps in the very first minute and theSUT sustains this until the end of 3rd minute allowing the Client Balancer to release 3OLAP workers. Once the threshold barrier with 20% error margin was broken the OLTPthroughput slides lower and stabilizes until the very end of the execution time. TheOLTP throughput registered is 8.27 tpmC while sustaining 5.49 QphH. With 3 activeOLAP clients the unified metric QpHpW equals 1.83 @ 8.27 tpmC. These measurementsare identical to our expected system performance with the OLTP throughput slightlylower than what was observed for OLTP SUT and higher than the OLAP SUT. Themeasure of efficiency with OLAP workers QpHpW for MemSQL resides precisely inbetween those for OLTP and OLAP SUT.

In terms of the number of clients, these results do not show a clear trend, but still showmore clients than OLAP systems, and less or comparable than for OLTP systems withthe same isolation level.

Page 96: Empirical evaluation of state-of-the-art databases on

78 5. Evaluation of HTAP Database Systems with HTAPBench

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 5.5: MemSQL registered performance - Read-Committed : (a) target tps 2; (b)target tps 3; and, (c) target tps 4.

Considering the performance at the TPC-C workload, increasing the strictness of theisolation level does improve slightly such performance, with values at the highest tpsbeing 49.84 against a previously registered 45.77. This is a somehow surprising behavior,since the expectations are that stronger isolation levels will lead to more rollbacks andreduce the TPC-C throughput. TPC-H values do however seem to be more affected,with a unified metric QpHpW of 1.35 @ 49.84 tpmC showing some difference with

Page 97: Empirical evaluation of state-of-the-art databases on

5.2. Results Interpretation and Evaluation 79

respect to the metric of 6.89 @ 45.77 tpmC at a lower isolation level. Similarly the QphHdrops from 27.58 to 6.79. These results are also unexpected, since we assumed thatthere would be a drop in TPC-C performance, leading to an improvement in TPC-Hperformance, for example like the move between isolation levels in PostgreSQL.

When contrasted to other systems at this isolation level and the highest tps, MemSQLis shown to have a higher number of clients than PostgreSQL (5 against 3), but a lowernumber of clients than MySQL (7). It does not achieve the highest value at the unifiedmetric which is achieved by PostgreSQL. When considering the lowest tps, MySQLachieves the highest TPC-C performance (9.98 tpmC ) and MemSQL achieves the bestTPC-H performance (5.49 QphH ). MemSQL also achieves a competitive unified metricof 1.83 @ 8.27 tpmC, when compared to the close second (PostgreSQL), 0.50 @ 8.49tpmC.

Results show improvements in TPC-C performance, and some oscillations for TPC-Hperformance, as the target tps increases.

5.2.1.3 Transaction Isolation: Repeatable-Read

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 5.6: MemSQL registered performance: Repeatable-Read and target tps 1

target tps #OLAP QphH QpHpW @ tpmC

1 3 5.09 1.69 @ 7.63

2 6 11.58 1.93 @ 26.29

3 4 6.79 1.69 @ 37.49

4 4 9.19 2.29 @ 47.86

Table 5.3: MemSQL unified metric: Repeatable-Read

Page 98: Empirical evaluation of state-of-the-art databases on

80 5. Evaluation of HTAP Database Systems with HTAPBench

In this study we consider a more strict isolation level, studying Repeatable-Read. Fig-ure 5.6 depicts the same observed fact throughout our experiments of realizing the targettps in the first minute. The SUT was able to sustain the required OLTP throughputuntil the end of the 3rd minute, letting the Client Balancer launch 3 OLAP workersbefore the system performance dived below the 20% error margin threshold.

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 5.7: MemSQL registered performance - Repeatable-Read : (a) target tps 2; (b)target tps 3; and, (c) target tps 4.

Page 99: Empirical evaluation of state-of-the-art databases on

5.2. Results Interpretation and Evaluation 81

The SUT registered 7.63 tpmC and 5.09 QphH. The unified metric QpHpW, with 3active OLAP clients evaluates to 1.69 @ 7.63 tpmC. With higher target tps, SUT wasable to produce a better metric than with Read-Committed. The highest unified metricQpHpW was once again observed for target tps 4 - 2.29 @ 47.86 tpmC.

When considering the number of clients, the observation remains that there is little visibleinfluence of this factor in the overall efficiency of either TPC-C or TPC-H workloads.

The move to a stricter isolation level, when contrasted to Read-Committed, has mixedeffects. On the smaller and the highest target tps we observe a small amount of theexpected deterioration at TPC-C performance, and improvements at TPC-H performance.However this does not hold across the intermediate tps values. For tps 3 there is a betterTPC-H and TPC-C performance, whereas for tps 2 the performance at TPC-C is betterbut at TPC-H is worse.

When compared with the other SUT that supported this isolation level (MySQL withInnoDB), we find that MemSQL performs in general slightly lower for TPC-C transac-tions, but better for TPC-H queries. Thus the HTAP characteristics of the SUT arevalidated.

Finally, when considering the impact of increasing tps, we find that similar to the casefor read-uncommitted, the performance of TPC-C and TPC-H queries seems to generallyimprove.

5.2.1.4 Transaction Isolation: Serializable

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 5.8: MemSQL registered performance: Serializable and target tps 1

This final experiments with this SUT and the isolation levels comprises the Serializablelevel. Figure 5.8 depicts that the SUT was able to reach the target tps early in theexecution time window and sustain it until the end of second minute. The ClientBalancer was able to configure and release 2 OLAP worker before the threshold wasbroken. The OLTP throughput did not decrease to the magnitude as observed for

Page 100: Empirical evaluation of state-of-the-art databases on

82 5. Evaluation of HTAP Database Systems with HTAPBench

target tps #OLAP QphH QpHpW @ tpmC

1 2 7.09 3.54 @ 8.27

2 3 10.79 3.59 @ 25.52

3 4 8.79 2.19 @ 33.28

4 5 29.17 5.83 @ 49.84

Table 5.4: MemSQL unified metric: Serializable

Read-Committed. The SUT sustained 8.27 tpmC and 7.09 QphH. With 2 active OLAPclients the unified metric QpHpW, evaluates to 3.54 @ 8.27 tpmC. The metric with theconfigured transaction isolation performed better than the previous one.

Similar performance is observed for results with higher target tps which validates theevaluation of better performance. As a matter of fact, this is perhaps the best observedunified metric for the experiments conducted up until now, the best metric QpHpWregistered is for the configuration of this transaction isolation with target tps 4 - 5.83 @49.84 tpmC, which evaluates the SUT’s ability to process mixed workloads with minimaltolerable trade-off.

In terms of the number of clients, an increasing number of clients released with thegrowing tps, but an unclear relation between this number and the TPC-H performanceobserved.

TPC-C and TPC-H performance, with respect to the former isolation level is shownto be generally better. This is slightly surprising, since the expectation was perhaps adeteriorated TPC-C performance, but an improved TPC-H performance.

When compared to other systems at the same isolation level, across all cases MemSQLis able to achieve a better TPC-C performance. However, its TPC-H performance fallsbehind of PostgreSQL. Still, MemSQL is able to have a more competitive unified metricowing the efficiency of OLAP clients..

Increasing the target tps seems to improve the overall TPC-C and TPC-H results. Thissuggests that future studies should evaluate the role of this factor, and assess how itscales.

Page 101: Empirical evaluation of state-of-the-art databases on

5.2. Results Interpretation and Evaluation 83

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c)

Figure 5.9: MemSQL registered performance - Serializable: (a) target tps 2; (b) targettps 3; and, (c) target tps 4.

Page 102: Empirical evaluation of state-of-the-art databases on

84 5. Evaluation of HTAP Database Systems with HTAPBench

5.2.1.5 Column-Store

We conducted a single test with MemSQL column-store to better understand theinterpretation of results to evaluate the system. The implementation details remainunchanged. Minor changes were required to perform on the CREATE TABLE statementsto include the clustered keys mandatory for column-store engine.

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

Figure 5.10: MemSQL registered performance: Column-store

As depicted in Figure 5.10, the SUT could not achieve the target tps at any givenpoint throughout the test execution time. However by default configurations, the ClientBalancer launched a single OLAP worker. The SUT registered a weaker metric of 0.86tpmC and 0.89 QphH. The unified metric QpHpW, equals 0.89 @ 0.86 tpmC. Trying toachieve to the target tps the SUT reported smaller query numbers due to the frameworkof the benchmark in use.

We believe that future studies should expand on this evaluation.

5.2.2 Results over CockroachDB

CockroachDB is compatible with PostgreSQL and its JDBC driver and thus requires noadditional dependencies for using HTAPBench. The CockroachDB version installed forthe experiments using HTAPBench is v19.1.3. We deployed it with a total of 3 virtualnodes. HTAPBench is build with some modifications in the database schema createstatements. A few OLAP queries are not supported by CockroachDB and thus requiredconsiderable modifications. The server connection details are depicted in Figure 5.11.

Figure 5.11: CockroachDB JDBC connection configuration

Page 103: Empirical evaluation of state-of-the-art databases on

5.2. Results Interpretation and Evaluation 85

Tests are conducted for only two configurations with transaction isolation Serializableas any other isolation level is automatically upgraded to Serializable by CockroachDB.Plotted results are depicted in Figure 5.12 and for target tps 1, the system was ableto sustain the desired transactional throughput until the 8Th minute and the ClientBalancer was able to release 7 OLAP clients. The registered OLTP and OLAP metricare 8.64 tpmC and 2.99 QphH respectively. The unified metric QpHpW, thus evaluatesto 0.43 @ 8.64 tpmC. This behaviour is distant from the ideal HTAP spectrum andleans more towards OLTP system behaviour. This not unreasonable, given the designarchitecture of CockroachDB does not extend complete support for HTAP functionalityas of yet and employs the transactions first policy. However, it’s query parallelismmechanism and high data availability across the nodes assures the OLAP processingon the most recent data. For target tps 2 the registered metric are 20.93 tpmC and8.79 QphH with 3 active OLAP clients. The unified metric QpHpW, evaluates to 2.93@ 20.93 tpmC. This metric fits better into the spectrum of HTAP functionality.

Still further tests are required to better understand this system.

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a)

0

20

40

60

80

100

120

0 3 6 9 12 150

5

12

20

thro

ug

hp

ut

− t

xn

/se

c

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b)

Figure 5.12: CockroachDB registered performance - Serializable: (a) target tps 1; and,(b) target tps 2;

Page 104: Empirical evaluation of state-of-the-art databases on

86 5. Evaluation of HTAP Database Systems with HTAPBench

5.2.3 Discussion

Figure 5.13: Unified metric quadrant plot: MemSQL

For the results acquired using HTAPBench over HTAP database systems, it is crucial tointerpret the registered performance in the spectrum of the unified metric. The quadrantplot of MemSQL for the registered performance is depicted in Figure 5.13. The metricsfor the tests with all the different configurations reside near the close circumference ofthe centre of the quadrant plot. This intermediate performance as compared to thehigh throughput OLTP SUT and the analytically efficient OLAP SUT is indeed themerit of an HTAP database system. The SUT was able to perform analytics on thetransactional data while tolerating reasonable trade-off with the transactional workload.OLAP worker’s efficiency registered was little over moderate portraying the SUT’senhanced support for query processing as compared to an OLTP database system.

Concluding this discussion we present an introspective comparison of results acquiredfor all database systems in the next section.

5.3 Analogy across Results

We try to produce a similar comparison as elaborated earlier in Table 2.1 with theresults we acquired through our experiments for transaction isolation - Read-Committedin Table 5.5.

These metrics validate our experiments with results being in the correct orientation. The2 OLTP database systems PostgreSQL and MySQL performed well with transactionsand registered high tpmC allowing higher number of OLAP clients to be released - 6 and5 respectively. Efficiency of OLAP workers was poor where unified metric noted below 1queries per hour per worker from a database engine designed with a focus on OLTPworkloads. The OLAP database system MonetDB registered lowest TPC-C metric of1.05 tpmC allowing for just 1 OLAP client to be released. However, the OLAP worker

Page 105: Empirical evaluation of state-of-the-art databases on

5.3. Analogy across Results 87

SUT #OLAP QphH QpHpW @ tpmC

PostgreSQL 6 2.99 0.5 @ 8.49

MySQL 5 1.59 0.32 @ 9.98

MonetDB 1 5.29 5.29 @ 1.05

MemSQL 3 5.49 1.83 @ 8.27

Table 5.5: Overview of acquired results

was the most efficient and registered 5.29 QphH for just a single OLAP worker. TheQpHpW evaluated - 5.29 is the highest amongst all the SUTs through our experiments.The hybrid database engine - MemSQL registered the unified metric QpHpW 1.83 @8.27 tpmC residing precisely in-between OLTP and OLAP SUT which was the expectedend result. Efficiency of OLAP workers was higher than that with the OLTP SUT andlower than that with the OLAP SUT. Transactional or OLTP throughput registeredwas slightly lower than the OLTP SUT and extremely higher as compared to the OLAPSUT.

Figure 5.14: Unified metric quadrant plot: cumulative overview with read committedisolation

Figure 5.14 depicts the system performance of all the SUTs in the spectrum of HTAPfunctionality, at a Read-Committed isolation level. A simple understanding that canbe derived from the results acquired and the literature survey on HTAP systems isthat database systems operating on mixed workloads must produce higher transactionalthroughput while being able to perform business analytical operations on the real-time

Page 106: Empirical evaluation of state-of-the-art databases on

88 5. Evaluation of HTAP Database Systems with HTAPBench

transactional data. HTAP systems need not be as efficient with the OLAP processing asthe OLAP database system. This is due to the fact that for real-time analytics, queriesdo not require to scan large amounts of data but smaller data volumes recently updatedin the transactional memory. Thus moderate efficiency with OLAP processing wouldsuffice the business needs for real-time analytics from the HTAP system as observedfrom the results for MemSQL.

Overall, if a Pareto front where to be determined, to find the optimal system given aSerializable isolation level, and a target tps of 1, the trade-off choice would have to bedone only between PostgreSQL and MemSQL (since they subsume all other systems),with PostgreSQL being the proper choice if the performance on OLAP workloads is moreimportant than the performance on OLTP workloads, and MemSQL in the contrarycase. If the optimal system is to be determined for a read committed isolation level, thenfrom the systems studied only MemSQL and MonetDB would need to be considered,with MonetDB being the better for OLAP major workloads, and MemSQL being moresolid if the balance tips slightly more to OLTP performance.

We thereby conclude the results interpretation and evaluation for the hybrid transactionaland analytical processing (HTAP) database systems and this chapter.

5.4 Summary

The key takeaways from this chapter are listed as follows:

• The research questions pertained to this chapter were addressed with reference tothe results obtained from our experiments.

• We presented the results acquired for HTAP database systems.

– Overall HTAP systems show an intermediate performance regarding thenumber of clients spawned, being higher than those of OLAP systems, andcomparable or lower to those of OLTP systems. This is similar to theobservations of previous work with HTAPBench[11]. Still, throughout thestudies it was not possible to make a clear relation between the TPC-Hperformance and the number of OLAP clients.

– For MemSQL it was not possible to find a very large difference between thebehavior for different isolation levels. Only for the Serializable level it waspossible to notice big improvements in the support for OLAP workloads.

– The increase of target tps showed promising results in some isolation levels,suggesting that for MemSQL it would be worthwhile to study this further.

– The use of column-stores in MemSQL did not show a good performance withHTAPBench. Further studies are needed.

Page 107: Empirical evaluation of state-of-the-art databases on

5.4. Summary 89

– We report only 2 evaluations from our experiments with CockroachDB. Theseshow that at a tps of 1, the system behaves like an OLTP system (high tpmCand low QpHpW ), and at a tps of 2, the system behaves more like an HTAPsystem, with improved tpmC and QpHpW. Still, more studies are neededwith this system, to characterize it better.

• We discussed the performance as observed from the quadrant plot for all the SUTsand tried to characterize ideal industry standard HTAP behaviour. To illustratethe applications of the unified metric proposed by HTAPBench, we showed thattrade-off analysis can be done. For example, for a serializable isolation level, suchanalysis shows that PostgreSQL and MemSQL are the two best candidates, withPostgreSQL being the better if the queries are more OLAP-oriented. On the otherhand, if the isolation level is read committed, MemSQL and MonetDB become thebest candidates, with MonetDB being preferable only if the queries are markedlyOLAP-oriented.

Page 108: Empirical evaluation of state-of-the-art databases on

90 5. Evaluation of HTAP Database Systems with HTAPBench

Page 109: Empirical evaluation of state-of-the-art databases on

6. Conclusions and Future Work

In this thesis we carried out a series of experiments to evaluate the performance ofdiverse database systems when operating on mixed workloads. To this end we usethe HTAPBench benchmark, recently developed. The core motivation in using thisbenchmark is to be able to compare systems on a unified metric. We expected this tohelp to choose among systems in a trade-off space considering the importance to usersof the individual OLAP or OLTP performance, within the mixed workload.

We started our work by reviewing industry standards for workload-specific databasebenchmarks TPC-C and TPC-H and their reported performance metrics. We alsoreviewed in detail the design of HTAPBench, and described its components. HTAPBenchregulates the transactional workload, unlike any other benchmark, to better evaluate howkeeping the performance on this workload within a threshold can impact the number ofOLAP clients in the system. The Client Balancer, which is induced into the core designof the benchmark governs the release of new OLAP clients while sustaining the desiredtransactional throughput objectives with reasonable trade-offs. This mechanism is inline with business scenarios which might want to acquire real-time analytics over recentlyupdated data while reporting only small impacts on the transactional throughput.

For our evaluation we selected systems designed to be OLTP major (PostgreSQL andMySQL), OLAP major (MonetDB) and HTAP major (CockroachDB and MemSQL).Test results for the two workload-specific database systems - OLTP and OLAP registeredcontradictory performances as anticipated from their design focus.

While OLTP database systems were able to register peak transactional throughput, theOLAP database system evaluated was efficient with OLAP query processing, havinga good efficiency per client. Different scale factors and transaction isolation levelsinfluenced the system performance in different ways. Stronger isolation levels lead tolower TPC-C throughputs and to better TPC-H latencies for OLTP systems. Generally,increasing scale factors produced improved TPC-C and TPC-H performance. Increasingscale factors with the strongest isolation level showed the limits of MySQL, where the

Page 110: Empirical evaluation of state-of-the-art databases on

92 6. Conclusions and Future Work

TPC-H performance was observed to deteriorate in a large manner. As a takeaway,PostgreSQL was shown to be a good trade-off choice for stronger isolation levels, andMySQL for weaker ones. Additionally we experimented with alternative storage engines(MyISAM) to the recommended InnoDB engine for MySQL, finding an overall poorperformance.

For MonetDB we found that, as reported by Coelho et al.[11], a lesser number of clientsspawned. This can be easily explained considering that clients are resource intensive.Our results also show poor TPC-H performance but good TPC-H metrics. We alsoreport that the best performance is achieved for Read-Committed isolation.

For HTAP systems we did most of our evaluations with MemSQL. This system, asexpected, was able to launch a higher number of clients than OLAP systems, but asmaller or comparable number than OLTP systems. Overall isolation levels did not seemto have a large impact in the behavior of MemSQL, and only the strongest isolation levelwas shown to reduce TPC-C performance and improve TPC-H performance. Increasein tps showed some promising results in some isolation levels. These suggest that thesystem might perform well when facing larger workloads. This should be studied infuture work.

From our limited tests with CockroachDB we were not able to observe it to outperformthe alternative HTAP SUT. Still more studies are needed in this regard.

Overall, HTAP database systems produced results different from the prior two databasesystems with their OLAP performance residing in the average proximity of these twosystems. The observed system performance merits the alternative description of HTAPdatabase systems as OLTP database systems with the capability to process businessanalytical queries on the most recent data or data residing in the transactional memory.

From our experiments it is possible to conclude with some trade-off analysis about whichdatabase to select, at an expected tps, considering the trade-off that the user is willingto make between OLAP and OLTP performance. Hence our results validate the abilityof the unified metric of HTAPBench to assist users in performing such analysis. Fromour results, for example, for a serializable isolation level, PostgreSQL and MemSQLare the two best candidates, with PostgreSQL being the better if the queries are moreOLAP-oriented. On the other hand, if the isolation level is read committed, MemSQLand MonetDB become the best candidates, with MonetDB being preferable only if thequeries are markedly OLAP-oriented.

Page 111: Empirical evaluation of state-of-the-art databases on

6.1. Threats to Validity 93

6.1 Threats to Validity

In this section, we present possible threats to the validity of our conclusions.

• Internal threats:

– In our experiments using HTAPBench, we observed distinct system behaviourfor different transaction isolation levels. However, we did not carve deephere to validate the explicit cause of variations in system performance. Itwould be valuable to better understand how the actual design choices in thesupport for concurrency at different isolation levels in the system affect theperformance.

– The number of warehouses configured remained unaltered throughout ourexperiments as the scale factors employed were not of higher magnitudes.However, smaller number of warehouses could have influenced the OLAPperformance of tests with higher target tps.

– Test execution time of 60 minutes can reasonably be the best minimumexecution window. But we have conducted tests for higher scale factors withexecution time of 15 minutes due to time constraints. We have validatedexperimentally that there is little influence in our results, from the curtailmentof the test run times. Still, results for these configurations for a 60 minuteexecution window could produce different results.

– System performance improved for a higher target tps (i.e. higher scalefactors), but until what threshold scale factor value can the system reproducesimilar results before crashing due to overload of requests was not verifiedin our experiments. This is due to the fact that tests with higher scalefactors require too much time for the data loading stage and it is not feasibleto conduct higher number of tests within a short span of time. To makebetter comparisons across systems it would be good to evaluate them at theirsaturation points.

– Database configuration parameters, apart from the default configurationsused, could have a high impact on the performance observed.

• External threats:

– A question overlooked in this Thesis, is whether these acquired results canbe actually found for other HTAP database systems. There are numerousother vendors with state-of-the-art database systems which can be evaluatedusing HTAPBench, to generate ample results as reference for comparisons ofHTAP systems.

– Real-world business analytic queries are more complex and the workload isever-changing in size and nature. How well do the acquired results usingTPC-H or TPC-C reflect the real industry performance of HTAP systems isdebatable.

Page 112: Empirical evaluation of state-of-the-art databases on

94 6. Conclusions and Future Work

6.2 Future Work

We enlist possible improvements for benchmarking systems with HTAP-Bench andthe potential fields of operations where the given unified performance metric can beapplicable.

• Further comprehensive system performance evaluation can be done with extensiveconfigurations of higher scale factors and other system-specific parameters.

• A more detailed examination can be done by extending the execution time,repeating tests to validate the results and employing optimal cluster configurations.

• The unified metric of HTAP-Bench could be used as part of the feedback controlmechanisms for automated database management tasks, self-configuring databasesystems, etc,.

• There is a wide potential research space available for the discussion on workloaddesign modifications of benchmarks evaluating systems performance for mixedworkloads.

6.3 Concluding Remarks

We evaluated the system performance peculiar to OLTP, OLAP and HTAP systems,under mixed workloads using HTAPBench. To the best of our understanding, we tried toevaluate an ideal behaviour of HTAP systems with respect to transactional and analyticalworkloads. This approach of evaluating system performance using benchmarks maynot be 100% accurate, but it is without doubt the best possible measure available tounderstand what data systems should be adopted to meet real-world requirements in abetter and more precise way. We conclude our work with the hope that our researchcan contribute to extend good practices for pragmatic evaluations that contribute tounderstanding the performance of database systems under mixed workloads, and totrade-off analysis in selecting the most fitting database system.

Page 113: Empirical evaluation of state-of-the-art databases on

Glossary

• Client Balancer: Control mechanism introduced in HTAPBench which regu-lates query processing, to limit the number of OLAP clients that are allowed tobe spawned, while keeping the reported TPC-C performance within a prefixedthreshold range.

• Isolation levels (transaction isolation levels):

– Read-Uncommitted: Lowest level of isolation with high data availability,but highly vulnerable to the dirty read phenomena (where transactions areallowed to read data which has been modified by other transactions not yetcommitted).

– Read-Committed: Stronger isolation level that, in contrast to read-uncommitted,disallows dirty reads.

– Repeatable-Read: Similar to Read-Committed but prevents a phenomenacalled a non-repeatable read. This phenomena can happen when a dataitem that is read at least two times within a transaction (and without beingupdated in the transaction) cannot be guaranteed to have the same valuewithin the transaction. This isolation level can be supported by applyingread and write lock simultaneously on the data record which makes only thecommitted data available for both read and write operations.

– Serializable: This is the strongest of the isolation levels available. It imitatesa serial execution. As a result it avoids all kinds of known anomalies whichmight happen when reading and writing concurrent data in a database.

• Isolation (workload): A measure for how well the database can isolate OLTPclients from the impact of increasing the number of concurrent OLAP clients. Theauthors of HTAPBench propose that this can be quantified with the unified metricof their benchmark.

Page 114: Empirical evaluation of state-of-the-art databases on

96 6. Glossary

• OLAP: On-line analytical processing.

• OLAP workers: OLAP clients determined for query processing.

• OLTP: On-line transactional processing.

• QphH: “Queries per hour of type H” - performance metric of TPC-H, reported inHTAPBench.

• QpHpW @ tpmC : “Queries per hour, per worker of type H” @ tpmC - unifiedperformance metric of HTAPBench.

• Scale factor: Determines the data-size by scaling-up.

• SUT: Database system under test.

• TPC-C: OLTP benchmark, from the TPC, to evaluate transactional throughputof database systems.

• TPC-H: Decision support benchmark, from the TPC, to evaluate OLAP perfor-mance of database systems.

• tpmC : “Transactions per minute” - performance metric of TPC-C reported inHTAPBench.

• tps: Transactions per second.

• Warehouse: Different from a data warehouse, in the TPC-C benchmark, awarehouse is a central entity in the provided schema. It controls the scaling-upof values inside the database, such that increasing the number of warehousesin the TPC-C and HTAPBench benchmark configurations leads to increasingproportionally the database size.

Page 115: Empirical evaluation of state-of-the-art databases on

Bibliography

[1] D. Harris, “3 alternatives to olap data warehouses.” https://www.softwareadvice.com/resources/olap-data-warehouse-alternatives/, Mar 2017. last accessed 27 July2019. (cited on Page 5 and 2)

[2] R. Sen and K. Ramachandra, “Characterizing resource sensitivity of databaseworkloads,” in 2018 IEEE International Symposium on High Performance ComputerArchitecture (HPCA), pp. 657–669, Feb 2018. (cited on Page 5, 6, 11, 12, and 13)

[3] I. Psaroudakis, F. Wolf, N. May, T. Neumann, A. Bohm, A. Ailamaki, and K.-U.Sattler, “Scaling up mixed workloads: A battle of data freshness, flexibility, andscheduling,” in Performance Characterization and Benchmarking. Traditional to BigData (R. Nambiar and M. Poess, eds.), (Cham), pp. 97–112, Springer InternationalPublishing, 2015. (cited on Page 5, 1, 9, 12, 13, 52, and 54)

[4] M. Zhang, P. Martin, W. Powley, and J. Chen, “Workload management in databasemanagement systems: A taxonomy,” IEEE Transactions on Knowledge and DataEngineering, vol. 30, pp. 1386–1402, July 2018. (cited on Page 5, 6, 10, and 14)

[5] K. Prow, “Transaction isolation levels in sql server.” https://www.kevinprow.com/2015/09/20/transaction-isolation-levels-in-sql-server/. last accessed 7 August 2019.(cited on Page 5 and 17)

[6] L. Choudhary, “Mysql architecture and components.” https://lalitvc.wordpress.com/2016/11/03/mysql-architecture-and-components/, 2016. last accessed 2 August2019. (cited on Page 5 and 20)

[7] S. Teotia, “Vectorized processing in analytical query engines.” https://loonytek.com/2018/04/26/vectorized-processing-in-analytical-query-engines/, 2016. last accessed4 August 2019. (cited on Page 5, 21, and 22)

[8] “cockroachdb.” https://github.com/cockroachdb/cockroach/blob/master/docs/design.md, 2018. (cited on Page 5 and 25)

[9] “Tpc-c.” http://www.tpc.org/tpcc/. last accessed 4 August 2019. (cited on Page 5,

27, 28, 29, and 30)

Page 116: Empirical evaluation of state-of-the-art databases on

98 Bibliography

[10] F. Funke, A. Kemper, and T. Neumann, “Benchmarking hybrid oltpolap databasesystems,” in Datenbanksysteme fur Business, Technologie und Web (BTW)(T. Harder, W. Lehner, B. Mitschang, H. Schoning, and H. Schwarz, eds.), (Bonn),pp. 390–409, Gesellschaft fur Informatik e.V., 2011. (cited on Page 5, 29, and 32)

[11] F. Coelho, J. a. Paulo, R. Vilaca, J. Pereira, and R. Oliveira, “Htapbench: Hybridtransactional and analytical processing benchmark,” in Proceedings of the 8ThACM/SPEC on International Conference on Performance Engineering, ICPE ’17,(New York, NY, USA), pp. 293–304, ACM, 2017. (cited on Page 5, 9, 2, 3, 10, 33, 34,

35, 36, 37, 38, 39, 43, 44, 45, 51, 88, and 92)

[12] J. Giceva and M. Sadoghi, Hybrid OLTP and OLAP, pp. 1–8. 01 2018. (cited on

Page 1)

[13] H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, and M. Zhang, “In-memory big datamanagement and processing: A survey,” IEEE Transactions on Knowledge andData Engineering, vol. 27, no. 7, p. 1920–1948, 2015. (cited on Page 1)

[14] M. Pezzini, D. Feinberg, N. Rayner, and R. Edjali, “Hybrid transaction/analyticalprocessing will foster opportunities for dramatic business innovation,” 2014. https://www.gartner.com/en/documents/2657815 Last accessed 30 July 2019. (cited on

Page 1, 7, and 31)

[15] I. HRUBARU and M. Fotache, “On the performance of three in-memory datasystems for on line analytical processing,” Informatica Economica, vol. 21, pp. 5–15,03 2017. (cited on Page 2)

[16] S. Elnaffar, P. Martin, and R. Horman, “Automatically classifying database work-loads,” in Proceedings of the Eleventh International Conference on Information andKnowledge Management, CIKM ’02, (New York, NY, USA), pp. 622–624, ACM,2002. (cited on Page 6 and 14)

[17] M. Stonebraker and L. A. Rowe, “The design of postgres,” SIGMOD Rec., vol. 15,pp. 340–355, June 1986. (cited on Page 6 and 18)

[18] P. Boncz, “Monet; a next-generation dbms kernel for query-intensive applications,”01 2002. (cited on Page 7 and 22)

[19] A. Pavlo and M. Aslett, “What’s really new with newsql?,” SIGMOD Rec., vol. 45,pp. 45–55, Sept. 2016. (cited on Page 7)

[20] F. Ozcan, Y. Tian, and P. Tozun, “Hybrid transactional/analytical processing: Asurvey,” in Proceedings of the 2017 ACM International Conference on Managementof Data, SIGMOD ’17, (New York, NY, USA), pp. 1771–1775, ACM, 2017. (cited

on Page 7)

Page 117: Empirical evaluation of state-of-the-art databases on

Bibliography 99

[21] A. Bohm, J. Dittrich, N. Mukherjee, I. Pandis, and R. Sen, “Operational analyticsdata management systems,” Proc. VLDB Endow., vol. 9, pp. 1601–1604, Sept. 2016.(cited on Page 7 and 24)

[22] J. Giceva and M. Sadoghi, “Hybrid oltp and olap.,” 2019. (cited on Page 7)

[23] H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, and M. Zhang, “In-memory big datamanagement and processing: A survey,” IEEE Transactions on Knowledge andData Engineering, vol. 27, no. 7, pp. 1920–1948, 2015. (cited on Page 7)

[24] A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. Mowry,M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang,Y. Wu, R. Xian, and T. Zhang, “Self-driving database management systems,” inCIDR 2017, Conference on Innovative Data Systems Research, 2017. (cited on

Page 7)

[25] J. Arulraj, A. Pavlo, and P. Menon, “Bridging the archipelago between row-storesand column-stores for hybrid workloads,” in Proceedings of the 2016 InternationalConference on Management of Data, SIGMOD ’16, (New York, NY, USA), pp. 583–598, ACM, 2016. (cited on Page 7)

[26] K. Kim, T. Wang, R. Johnson, and I. Pandis, “Ermia: Fast memory-optimizeddatabase system for heterogeneous workloads,” in Proceedings of the 2016 Interna-tional Conference on Management of Data, SIGMOD ’16, (New York, NY, USA),pp. 1675–1687, ACM, 2016. (cited on Page 7)

[27] Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo, “An empirical evaluation of in-memory multi-version concurrency control,” Proceedings of the VLDB Endowment,vol. 10, no. 7, pp. 781–792, 2017. (cited on Page 8, 18, 48, 66, and 71)

[28] B. Darfler, “Cockroachdb: A scalable, geo-replicated, transactional datastore.”https://www.infoq.com/news/2014/08/CockroachDB/. 2019-07-26. (cited on Page 8

and 46)

[29] R. Nambiar, N. Wakou, F. Carman, and M. Majdalany, “Transaction processingperformance council (tpc): State of the council 2010,” in Performance Evaluation,Measurement and Characterization of Complex Systems (R. Nambiar and M. Poess,eds.), (Berlin, Heidelberg), pp. 1–9, Springer Berlin Heidelberg, 2011. (cited on

Page 8 and 26)

[30] S. W. Dietrich, M. Brown, E. Cortes Rello, and S. Wunderlin, “A practitioner’sintroduction to database performance benchmarks and measurements,” Comput. J.,vol. 35, pp. 322–331, 08 1992. (cited on Page 8)

[31] D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux, “Oltp-bench: Anextensible testbed for benchmarking relational databases,” Proc. VLDB Endow.,vol. 7, pp. 277–288, Dec. 2013. (cited on Page 8)

Page 118: Empirical evaluation of state-of-the-art databases on

100 Bibliography

[32] S. T. Leutenegger and D. Dias, “A modeling study of the tpc-c benchmark,” inProceedings of the 1993 ACM SIGMOD International Conference on Managementof Data, SIGMOD ’93, (New York, NY, USA), pp. 22–31, ACM, 1993. (cited on

Page 9)

[33] S. Chen, A. Ailamaki, M. Athanassoulis, P. B. Gibbons, R. Johnson, I. Pandis, andR. Stoica, “Tpc-e vs. tpc-c: Characterizing the new tpc-e benchmark via an i/ocomparison study,” SIGMOD Rec., vol. 39, pp. 5–10, Feb. 2011. (cited on Page 9

and 27)

[34] P. Boncz, T. Neumann, and O. Erling, “Tpc-h analyzed: Hidden messages andlessons learned from an influential benchmark,” in Performance Characterizationand Benchmarking (R. Nambiar and M. Poess, eds.), (Cham), pp. 61–76, SpringerInternational Publishing, 2014. (cited on Page 9)

[35] I. HRUBARU and M. Fotache, “On the performance of three in-memory datasystems for on line analytical processing,” Informatica Economica, vol. 21, pp. 5–15,03 2017. (cited on Page 9, 24, and 31)

[36] K. Kaur and M. Sachdeva, “Performance evaluation of newsql databases,” in 2017International Conference on Inventive Systems and Control (ICISC), pp. 1–5, Jan2017. (cited on Page 9)

[37] A. Bog, H. Plattner, and A. Zeier, “A mixed transaction processing and operationalreporting benchmark,” Information Systems Frontiers, vol. 13, pp. 321–335, July2011. (cited on Page 9)

[38] R. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. Kuno,R. Nambiar, T. Neumann, M. Poess, K.-U. Sattler, M. Seibold, E. Simon, andF. Waas, “The mixed workload ch-benchmark,” in Proceedings of the Fourth In-ternational Workshop on Testing Database Systems, DBTest ’11, (New York, NY,USA), pp. 8:1–8:6, ACM, 2011. (cited on Page 9 and 31)

[39] A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson,“Predicting multiple metrics for queries: Better decisions enabled by machinelearning,” in Proceedings of the 2009 IEEE International Conference on DataEngineering, ICDE ’09, (Washington, DC, USA), pp. 592–603, IEEE ComputerSociety, 2009. (cited on Page 15)

[40] D. Kossmann, “The state of the art in distributed query processing,” ACM Comput.Surv., vol. 32, pp. 422–469, Dec. 2000. (cited on Page 15)

[41] M. Zhang, P. Martin, W. Powley, P. Bird, and D. Kalmuk, “A framework forautonomic workload management in dbmss,” it - Information Technology, vol. 56,01 2014. (cited on Page 15)

Page 119: Empirical evaluation of state-of-the-art databases on

Bibliography 101

[42] B. Chandramouli, C. N. Bond, S. Babu, and J. Yang, “Query suspend and resume,”in Proceedings of the 2007 ACM SIGMOD International Conference on Managementof Data, SIGMOD ’07, (New York, NY, USA), pp. 557–568, ACM, 2007. (cited on

Page 15)

[43] “Transaction isolation levels in dbms.” https://www.geeksforgeeks.org/transaction-isolation-levels-dbms/. last accessed 3 August 2019. (cited on

Page 17)

[44] “Contributor profiles.” PostgreSQL Global Development Group. (cited on Page 18)

[45] MySQL 8.0 Reference Manual. MySQL, 2019 (accessed July 28, 2019). (cited on

Page 19 and 20)

[46] D. J. Abadi, P. A. Boncz, and S. Harizopoulos, “Column-oriented database systems,”Proc. VLDB Endow., vol. 2, pp. 1664–1665, Aug. 2009. (cited on Page 21)

[47] S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten,“Monetdb: Two decades of research in column-oriented database architectures.,”IEEE Data Eng. Bull., vol. 35, no. 1, pp. 40–45, 2012. (cited on Page 22 and 23)

[48] “Architecture overview.” https://www.cockroachlabs.com/docs/stable/architecture/overview.html. last accessed 6 August 2019. (cited on Page 25 and 26)

Page 120: Empirical evaluation of state-of-the-art databases on

102 Bibliography

Page 121: Empirical evaluation of state-of-the-art databases on