data diversity at

55
Data Diversity . When MySQL can’t do everything.

Upload: kristian-koehntopp

Post on 22-Jan-2018

1.071 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Data Diversity at

Data Diversity.

When MySQL can’t do everything.

Page 2: Data Diversity at

• MySQL Scalability @ Booking, 2006-2014

• Openstack @ SysEleven,2014-2016

• Production Scalability @ Booking, 2016-

Kristian Köhntopp.

Page 3: Data Diversity at

A lot of MySQL.

• Over 6000 Instances running.

• Over 100 Replication chains.

• Very diverse workload and sizes.

Page 4: Data Diversity at

Trying to stay current. And failing.

Running mostly 5.6. Upgrade project stalled due to various problems.

Parts on 5.7. Where it is running, it performs nicely and provides highly desirable features.

Key population on MariaDB. We need to be able to execute our workload on both strains of MySQL at all times.

Testing Prereleases in production. We would like to find problems before GA.

Page 5: Data Diversity at

MySQL does a lot.

but not

everything.

Page 6: Data Diversity at

First “NoSQL” database.

• Used for search autocompletion, much faster that MySQL.

• Now using an in-house thing, Brick.

• ES now used for log handling at Booking.

Elasticearch.

Page 7: Data Diversity at

• 20+ Clusters, largest 150 nodes, several hundred TB of syslog data.

• Hitting size limits, lots of toil. • ES Clusters unstable at size • Sync times unacceptable. • Blocking index creation.

• Takes up to a minute for data to show up in searches in smaller clusters.

Large clusters.

Page 8: Data Diversity at

Brick.

In-house full text search system, based on Lucene. Managed garbage collection, tons of sort, ordering plugins for lucene.

Page 9: Data Diversity at

• Fulltext search in MySQL is a joke does not meet requirements.

• MySQL Scalability limits: 5x memory. • Clustering, Sharding story?

• ES 2x-10x faster on small, simple data sets.

• Difference more pronounced on larger sets.

Why not MySQL?

Page 10: Data Diversity at

• Support not useful at our scale. • We know Java, JVM, Lucene,

Cluster Comms, Scaling. • We match and track issues upstream. • We would be much more interested

into engineering access, instead.

• ES mostly works for us within its limits.

• Elastic is present on github,we are present on their conferences,speak to their engineers.

• You get better support by not paying. WTF?

Working with Elastic?

Page 11: Data Diversity at

Big Data.

2012 Aggregations fail Aggregating one hour of data in “dw” (Data Warehouse) schema took longer than one hour.

Page 12: Data Diversity at

Skunkworks Take decommissioned database boxes,

hand install Hadoop, run proof of concept dw queries.

Manual parallelisation Hand crafted perl utilises all cores on a single machine, buying us time to try things.

60 cores, 56x speedup Running aggregations in parallel provides a nearly

linear speedup.

Page 13: Data Diversity at

Today.

Getting a budget Run a proper install, learn a lot: Standard HP boxes do not work so well, using a distro vs using vanilla Hadoop.

Eight clusters 8 clusters + 2 sandboxes, 4 with hive,

27 PB of data present (2012-08, going to cut off @ 2y)

Page 14: Data Diversity at

Running Hadoop.

Events and event processing, used in everything from monitoring to BI,

Aggregations, DWH/BI, Reporting, Analytics

Page 15: Data Diversity at

Running HBase.

Real time monitoring, Front End Roundtripping,

MySQL Time Machine & Replication, Mesos/Marathon Integration.

Page 16: Data Diversity at

Frontend to Hadoop and back Data collected, aggregated and filtered back, in realtime.

Page 17: Data Diversity at

Frontend to Hadoop and back Data collected, aggregated and filtered back, in realtime.

Aggregated Visitors Aggregate front end logs, produce per object looker statistics.

Page 18: Data Diversity at

Frontend to Hadoop and back Data collected, aggregated and filtered back, in realtime.

Aggregated Bookers Aggregate reservation stats, produce per object booker statistics.

Aggregated Visitors Aggregate front end logs, produce per object looker statistics.

Page 19: Data Diversity at

Frontend to Hadoop and back Data collected, aggregated and filtered back, in realtime.

Aggregated Bookers Aggregate reservation stats, produce per object booker statistics.

Aggregated Properties Per Location, find similar properties, produce set counts.

Aggregated Visitors Aggregate front end logs, produce per object looker statistics.

Page 20: Data Diversity at

Hadoop Cluster Stats

[lhr4] $ hdfs dfs -df -h Filesystem Size Used Available Use% hdfs://nameservice1 41.2 P 27.0 P 10.8 P 66%

[lhr4] $ htloc raw_events /hive_tables/raw_tables/raw_events

[lhr4] $ hdfs dfs -du -h -s /hive_tables/raw_tables/raw_events4.9 P 14.6 P /hive_tables/raw_tables/raw_events

Page 21: Data Diversity at

Mostly HIVE.

• Over 95% of the workload: HQL.

• Declarative, powerful, familiar.

• ODBC Endpoint: Excel, R.

• Well known to data people, good tooling.

Page 22: Data Diversity at

Quietly HBase.

• Running for one year now, 400 nodes.

• Real Time Metrics.

• Not widely advertised.

• “Have you considered MySQL first?”

Page 23: Data Diversity at

Quietly HBase.

• Running for one year now, 400 nodes.

• Real Time Metrics.

• Not widely advertised.

• “Have you considered MySQL first?”

MySQL Time Machine

HBase Replication

Matterhorn 1

17:20

Page 24: Data Diversity at

• MySQL did not scale to parallel processing.

• MySQL sucks at does not meet requirements for DWH processing.

• MySQL and Petabytes to not mix.

Why not MySQL?

Page 25: Data Diversity at

• Hadoop is a lot like Lego:under load, parts fall off.

• CDH is 1+ year behind Hadoop current.

• Few clusters, many interests: Diverse workload. Interference.

• Operational challenges: No interruptions.

• Toil, Bugs, Interference.

Hadoop Ecosystem?

Page 26: Data Diversity at

• Hadoop: not built for deployment from vanilla repos.

The distro question.

Page 27: Data Diversity at

• Hadoop: not built for deployment from vanilla repos.

• Vendor per-seat licensing does not scale for us.

• Control issues: if you do stuff past our admin system, you are off support.

• Integration with existing provisioning and monitoring.

• Build in-house knowhow.

The distro question.

Page 28: Data Diversity at

• Hadoop: not built for deployment from vanilla repos.

• Vendor per-seat licensing does not scale for us.

• Control issues: if you do stuff past our admin system, you are off support.

• Integration with existing provisioning and monitoring.

• Build in-house knowhow.

• Booking:

• save on licensing, hire people instead.

• partnership vs. customer/vendor.

• Some contributions upstream, but not very active.

The distro question.

Page 29: Data Diversity at

• Isolation problems, known defects

Educating users.

Page 30: Data Diversity at

• Isolation problems, known defects

• Schema and access require a lot of rethinking:

• Table design, index usage.

• Hotspotting (overwhelming a single node).

Educating users.

Page 31: Data Diversity at

• Isolation problems, known defects

• Schema and access require a lot of rethinking:

• Table design, index usage.

• Hotspotting (overwhelming a single node).

• Mandatory user education.

• Mandatory table review.

• This is not the Booking.com way.

Educating users.

Page 32: Data Diversity at

• Enterprise hardware: Smart Controllers, small, fast disks,expensive redundancy features.

• All of this is useless for Hadoop.

• Buy bulk from Taiwanese maker. The hardware question.

Page 33: Data Diversity at

Value Adding yourself into obsolescence.

Page 34: Data Diversity at

https://twitter.com/davykestens/status/654334869267312640

Value Adding yourself into obsolescence.

Page 35: Data Diversity at

Graphing.

Graphite at Booking: A hackathon project that escaped from the cage.

32 Frontend Servers, 200 Store Servers in two data centers, 2M unique metrics per second (8M hitting the stores), 130 TB metrics in total, 11 Gbps traffic on the backend.

Page 36: Data Diversity at

Graphite setup. After mutation

Page 37: Data Diversity at

Graphite setup. After mutation

Graphite @ Scale:

How to store million metric

per second

Vladimir Smirnov

LinuxCon Europe 2016,

5. October 2016

Page 38: Data Diversity at

It’s on github.

carbonzipper — github.com/dgryski/carbonzippercarbonserver — github.com/grobian/carbonserver

carbonapi — github.com/dgryski/carbonapicarbon-c-relay — github.com/grobian/carbon-c-relay

Page 39: Data Diversity at

• MySQL does not meet the requirements for time series data.

• Data deletion kills the server (Partitions help a bit).

• MySQL does not cluster, so scalability limit.

Why not MySQL?

Page 40: Data Diversity at

• Python does not scale. At all. • Practially rewrote the backend

completely in Go and C.

• It won’t grow another 10x. • Ops heavy, scales linearly with cluster

size, metrics discovery needs improvement. Storage side is a goner.

• Time Series Data at scale is hard.

Graphite problems

Page 41: Data Diversity at

Graphite and Community.

• “Raintank”, startup in the monitoring space, hiring Grafana and Graphite people.

• We are working with the Graphite community,but we have probably outgrown all their use cases.

• We are most likely on our own with all of this.

Page 42: Data Diversity at

Cassandra.

2014: PoC for an S3-like Photo Storage: BOSOS (Booking Simple Object STorage) 2015: Perl Tooling (previously: Go Tooling) DBD::cassandra, mearc (40T MySQL instance) because MariaDB cassandra engine, bizmail archive, RTM, Event lookup table, “that chatbot project”, PII storage? 2016: DBD::cassandra becomes CPAN module, Experiment tool data

Page 43: Data Diversity at

Cassandra

15 clusters

200 nodes

700 TB data

3 datacenters

1-2ms response

time

Page 44: Data Diversity at

?• Any large data set that needs sharding and simple access.

• If you need indexes, sharing is a pain, but Cassandra does that.

• Using CQL(limited features, but declarative)

How we are using it.

Page 45: Data Diversity at

• no cluster, scaling limits.

• no automated sharding, sharding by proxy is dangerous.

• large BLOB storage in MySQL does not meet requirements.

• We tried the MariaDB Cassandra engine, but that did not work out.

Why not MySQL?

Page 46: Data Diversity at

• Individual node failures: non-issue.Rolling restarts fine.

• “Never run the latest version”, running the oldest supported version.

• Secondary indexes do not work the way one would expect from MySQL. We are still recovering from that.

Cassandra problems

Page 47: Data Diversity at

Cassandra and community.

• We do not have commercial support, running community version.Considering buying support for community version, but as of now not large enough deployment to justify that.

• Considering NRE for a few things. • Cassandra community on IRC is amazing.

• We sometimes file tickets, and are beginning to upstream fixes. • DBD::cassandra, Cassandra::client

Page 48: Data Diversity at

Redis.

Used a lot as better memcache, as a Queue.

Used with persistence only in sessapp (temp storage for session data), LRU expiring to MySQL, previously hacked version of Redis,50 + 50 nodes

Page 49: Data Diversity at

Redis.

No support, no community work.

Standard deployment, boring.

Page 50: Data Diversity at

Also using.Postgres - used in our Puppet deployment. Riak - used in the event processing pipeline, future unclear. RocksDB - used in “Smart AV”, decidated availability service.

Postgres Riak RocksDB

Page 51: Data Diversity at

• Supposedly there is MongoDB running somewhere at Booking.

• Have been searching for it.

• Found no one willing to admit to it.

MongoDB?

Page 52: Data Diversity at

Conclusions for MySQL @ Booking.

• Recurring theme: need to shard, distributed database, multithreaded per shard. • MySQL breakdown at special use cases: • Fulltext, DWH, TSDB, BLOB Store, Column Store.

Page 53: Data Diversity at

Conclusions for NoSQL @ Booking.

• Trend towards declarative languages in NoSQL systems.

• Distributed systems are hard (Surprise!)

Page 54: Data Diversity at

Conclusions for Upstream Interaction.

• We tend to become self-supporting wrt to support. • We tend to become contributors in some way. • We tend to have need to “engineering access”.

Page 55: Data Diversity at

We are hiring.

Check https://workingatbooking.com/or talk to us at our booth.