towards understanding the performance of distributed ...€¦ · page 2 06.11.2019 | symposium on...

39
Towards Understanding the Performance of Distributed Database Management Systems in Volatile Environments Jörg Domaschka and Daniel Seybold Institute of Information Resource Management Ulm University | Ulm | Germany

Upload: others

Post on 10-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Towards Understanding the Performance of Distributed Database Management Systems in Volatile EnvironmentsJörg Domaschka and Daniel SeyboldInstitute of Information Resource ManagementUlm University | Ulm | Germany

Page 2: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Current Trends of Data-intensive Applications

Web 2.0 Big Data IoT

appl

icat

ion

dom

ains

&

requ

irem

ents

appl

icat

ion

arch

itect

ures

Infr

astr

uctu

res

Page 3: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 3 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Current Trends of Data-intensive Applications

Web 2.0 Big Data IoT

appl

icat

ion

dom

ains

&

requ

irem

ents

appl

icat

ion

arch

itect

ures

infr

astr

uctu

res

performance scalability

elasticity availability

Page 4: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 4 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

performance scalability

elasticity availability

performance scalability

elasticity availability

Current Trends of Data-intensive Applications

Web 2.0 Big Data IoT

performance scalability

elasticity availability

appl

icat

ion

dom

ains

&

requ

irem

ents

appl

icat

ion

arch

itect

ures

Infr

astr

uctu

res

Page 5: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 5 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Current Trends of Data-intensive Applications

Web 2.0 Big Data IoT

performance scalability

elasticity availability

appl

icat

ion

dom

ains

&

requ

irem

ents

appl

icat

ion

arch

itect

ures

infr

astr

uctu

res

performance scalability

elasticity availability

performance scalability

elasticity availability

Page 6: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 6 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Current Trends of Data-intensive Applications

Web 2.0 Big Data IoT

performance scalability

elasticity availability

appl

icat

ion

dom

ains

&

requ

irem

ents

appl

icat

ion

arch

itect

ures

infr

astr

uctu

res

performance scalability

elasticity availability

https://www.gartner.com/en/documents/3941821/the-future-of-the-dbms-market-is-cloud

Page 7: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 7 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Contribution

How to operate distributed DBMS in the cloud?

Insights in operating DBMS on cloud resources:

distributed DBMS impact factors

cloud resource impact factors

selected DBMS and cloud resource-centric evaluation results

Page 8: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 8 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Contribution

How to operate distributed DBMS in the cloud?

Insights in operating DBMS on cloud resources:

distributed DBMS impact factors

cloud resource impact factors

selected DBMS and cloud resource-centric evaluation results

the results summarizes the insights of a series of DBMS evaluation publications

Page 9: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 9 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Distributed DBMS Impact Factors

Page 10: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 10 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Distributed DBMS Impact Factors

data models1 sharding & scale1,2 replication & consistency2

1Mazumdar, S., Seybold, D., Kritikos, K., & Verginadis, Y. (2019). A survey on data storage and placement methodologies for cloud-big data ecosystem. Journal of Big Data, 6(1), 15. 2Domaschka, J., Hauser, C. B., & Erb, B. (2014, September). Reliability and availability properties of distributed database systems. In 2014 IEEE 18th International Enterprise Distributed Object Computing Conference (pp. 226-233). IEEE.

Page 11: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 11 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Distributed DBMS Impact Factors

data models sharding & scale replication & consistency

RDBMS

NewSQL

NoSQL

Page 12: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 12 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Distributed DBMS Impact Factors

data models sharding & scale replication & consistency

RDBMS

NewSQL

NoSQLcluster size

architecture

shardingmechanism

Page 13: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 13 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Distributed DBMS Impact Factors

data models sharding & scale replication & consistency

RDBMS

NewSQL

NoSQLcluster size

architecture

shardingmechanism

a*

a

consistency model replication

mechanism

replication factor

Page 14: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 14 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Distributed DBMS Impact Factors

data models sharding & scale replication & consistency

RDBMS

NewSQL

NoSQLcluster size

architecture

shardingmechanism

a*

a

consistency model replication

mechanism

replication factor

> 220 NoSQL & 20 NewSQL DBMS on the market1

1http://nosql-database.org/

Page 15: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 15 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Distributed DBMS Impact Factors

DBMS impact factors

cluster size

sharding

factor

consistency model

range hash

replication

ACID BASE scope

data model

RDBMS NewSQL NoSQL …

client-side consistency

architecture

single master-slave

multi-master

scalability – elasticity – availability performance

configurable

predefined

Page 16: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 16 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Cloud Resource Impact Factors

Page 17: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 17 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Cloud Resource Impact Factors

provider1 resource type2 resource characteristics3

1Baur, D., Seybold, D., Griesinger, F., Masata, H., & Domaschka, J. (2018, May). A provider-agnostic approach to multi-cloud orchestration using a constraint language. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (pp. 173-182). IEEE Press.2Seybold, D., Hauser, C. B., Eisenhart, G., Volpert, S., & Domaschka, J. (2018, August). The Impact of the Storage Tier: A Baseline Performance Analysis of Containerized DBMS. In European Conference on Parallel Processing (pp. 93-105). Springer, Cham.3Seybold, D., Hauser, C. B., Volpert, S., & Domaschka, J. (2017, October). Gibbon: An availability evaluation framework for distributed databases. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 31-49). Springer

Page 18: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 18 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Cloud Resource Impact Factors

provider1 resource type resource characteristics

1Baur, D., Seybold, D., Griesinger, F., Masata, H., & Domaschka, J. (2018, May). A provider-agnostic approach to multi-cloud orchestration using a constraint language. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (pp. 173-182). IEEE Press.

public private

resource offerings

Page 19: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 19 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Cloud Resource Impact Factors

provider resource type resource characteristics

public

bare metal

private VMcontainer

storagesizing

resource offerings

Page 20: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 20 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Cloud Resource Impact Factors

provider resource type resource characteristics

bare metal

VMcontainer

storagesizing

interferences

failures

public private

resource offerings

Page 21: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 21 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Cloud Resource Impact Factors

provider resource type resource characteristics

bare metal

VMcontainer

storagesizing

interferences

failures

public private

resource offerings

> 20.000 public cloud resource offerings1

1https://cloudharmony.com/

Page 22: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 22 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Cloud Resource Impact Factors

cloud resource impact factors

characteristicsresource type

interferences failuresVM bare metal

provider

AWS … OpenStack container

scalability – elasticity – availability performance

configurable

predefined

storage

HDD SSD remote

sizing sizing

Page 23: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 23 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

How to operate distributed DBMS in the Cloud?

Page 24: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 24 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Cloud Resource Impact Factors

scalability – elasticity – availability performance

configurable

predefined

cloud resource impact factors

characteristicsresource type

range hashVM bare metal

provider

AWS …Open-Stack

container

storage

HDD SSD remote

DBMS impact factors

cluster size

sharding

factor

consistency model

range hash

replication

ACID BASE scope

data model

RDBMS

NewSQL

NoSQL …

client-side consistency

architecture

master-slave

mult-master

Page 25: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 25 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Evaluation Scenario: Client-Consistency & Cluster Size

scalability – elasticity – availability performance

dynamic

static

cloud resource impact factors

characteristicsresource type

range hashVM bare metal

provider

AWS …Open-Stack

container

storage

HDD SSD remote

DBMS impact factors

sharding

factor

consistency model

range hash

replication

ACID BASE scope

data model

RDBMS

NewSQL

NoSQL …

client-side consistency

architecture

master-slave

mult-master

cluster size

Page 26: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 26 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Evaluation Scenario: Client-Consistency & Cluster Size

evaluation environment

provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage

DBMS: Apache Cassandraversion: 3.11

Workload: YCSBType: write-heavy

complete evaluation details1

1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.

Page 27: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 27 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Evaluation Scenario: Client-Consistency & Cluster Size

evaluation environment

provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage

DBMS: Apache Cassandraversion: 3.11

Workload: YCSBType: write-heavy

complete evaluation details1

1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.

consistency – performance impact

Page 28: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 28 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Evaluation Scenario: Client-Consistency & Cluster Size

evaluation environment

provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage

DBMS: Apache Cassandraversion: 3.11

Workload: YCSBType: write-heavy

complete evaluation details1

1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.

scalability

Page 29: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 29 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Evaluation Scenario: Client-Consistency & Cluster Size

evaluation environment

provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage

DBMS: Couchbaseversion: 5.0.1

Workload: YCSBType: write-heavy

complete evaluation details1

1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.

consistency – performance impact

Page 30: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 30 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Evaluation Scenario: Client-Consistency & Cluster Size

evaluation environment

provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage

DBMS: Couchbaseversion: 5.0.1

Workload: YCSBType: write-heavy

complete evaluation details1

1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.

consistency – performance impact

scalability

Page 31: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 31 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Lessons Learned: Client-Consistency & Cluster Size

minor changes to the DBMS configuration may have significant performance impact

scalability depends on (resources1), DBMS runtime configuration and workload properties

1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.

Page 32: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 32 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Evaluation Scenario: Resource Types

scalability – elasticity – availability performance

configurable

predefined

cloud resource impact factors

characteristicsresource type

range hash

VM bare metal

provider

AWS …Open-Stack

container

storage

HDD SSD remote

DBMS impact factors

cluster size

sharding

factor

consistency model

range hash

replication

ACID BASE scope

data model

RDBMS

NewSQL

NoSQL …

client-side consistency

architecture

master-slave

mult-master

Page 33: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 33 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Evaluation Scenario: Resource Types

evaluation environment

provider: OpenStack Ulmresource: VMsizing: 4 cores – 4GB memory – SSD storage

DBMS: MongoDBversion: 3.6.3

Workload: YCSBType: write-heavy

complete evaluation details1

1Seybold, D., Hauser, C. B., Eisenhart, G., Volpert, S., & Domaschka, J. (2018, August). The Impact of the Storage Tier: A Baseline Performance Analysis of Containerized DBMS. In European Conference on Parallel Processing (pp. 93-105). Springer, Cham.

physical

container

DBMS

physical

container

DBMS

physical

VM

DBMS

physical

VM

container

DBMS

Page 34: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 34 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Lessons Learned: Resource Types

virtualization reduces DBMS performance

storage location is an important and challenging decision for operating DBMS in the cloud

DBMS in containers on VM introduce neglectable overhead compared to operational benefits

Page 35: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 35 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Conclusion & Outlook

Page 36: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 36 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Conclusion

DBMS evaluation need to consider DBMS, cloud resource and workload characteristics

comprehensive DBMS evaluations are technically challenging, time consuming and error prone

Tool support is required!

Page 37: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 37 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Conclusion

DBMS evaluation need to consider DBMS, cloud resource and workload characteristics

comprehensive DBMS evaluations are technically challenging, time consuming and error prone

Tool support is required!

Mowgli Framework: fully automates DBMS evaluations and enables reproducible and portable evaluations!

Page 38: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 38 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Outlook

Advanced distributed DBMS evaluations: complex DBMS workloads – DBMS elasticity and availability – self-hosted DBMS vs. DBaaS

Automated DBMS operation in the cloud

Page 39: Towards Understanding the Performance of Distributed ...€¦ · Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS

Page 39 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments

Thank you!

The research leading to these results has received funding from the EC's Framework Programme HORIZON 2020 under grant agreement number 731664 (MELODIC) and 732667 (RECAP).

Mowgli Software:https://omi-gitlab.e-technik.uni-ulm.de/mowgli

Release 0.1:https://zenodo.org/record/3341512#.XcFnRehKiUk

DBMS Evaluation Data Sets:

Performance & Scalability:https://zenodo.org/record/3518786#.XcFnf-hKiUk

Elasticity:https://zenodo.org/record/3362279#.XcFnmehKiUk

containerized DBMS https://github.com/omi-uulm/Containerized-DBMS-Evaluation