less is more: novel approaches to mysql compression for modern data sets - percona live 2016

49
Novel Approaches to MySQL Compression for Modern Data Sets Less Is More Ernie Souhrada Database Engineer / Bit Wrangler, Pinterest Percona Live Data Performance Conference – 19 April 2016 1

Upload: ernie-souhrada

Post on 15-Apr-2017

651 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Novel Approaches to MySQL Compression for Modern Data Sets Less Is More

Ernie Souhrada Database Engineer / Bit Wrangler, Pinterest Percona Live Data Performance Conference – 19 April 2016 1

Page 2: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

•  Introductions •  The Data Explosion •  Stand Back, I’m Going to Math •  So Many Options, So Little CPU •  Don’t Try This At Home •  Not Your Grandfather’s GZIP •  Ooh, Shiny Numbers! •  Q&A

Agenda

2 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

My god, it’s full of cats!

Page 3: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Who am I? •  Database Engineer at Pinterest (January 2015) –  One of two people solely responsible for hundreds of TB of MySQL data

–  Also loosely affiliated with HBase and Core SRE teams

•  Previously: Percona, Sun, assorted random small companies •  Jack of many trades, master of some

Why am I here? •  Interested in almost EVERYTHING (not just tech) •  Mathematician by training; compression is fundamentally a math

problem.

Who Am I, Why Am I Here?

3 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Turning technical skill into cat food since 1996

Page 4: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

“Every two days now we create as much information as we did from the dawn of civilization up to 2003.” – Eric Schmidt, Google [1]

He said this in 2010. •  Mostly user-generated content –  Over 2 million cat videos on YouTube in 2015 [2] –  Lots of unstructured data, not easily put into relational form

•  Don’t forget the NSA! –  Although nobody really knows how much data they have….

The Data Explosion

4 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Because ‘DELETE’ is a four-letter word.

Page 5: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

The Data Explosion

5 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

In 2012, there were 2.1 billion people on the internet[3]

2012

Page 6: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

The Data Explosion

6 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Two years later, that number rose to 2.4 billion[4]

2014

Page 7: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

The Data Explosion

7 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Drowning in a sea of bits

Storage costs are stabilizing[5]

$0.02/GB

Page 8: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

The Data Explosion

8 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Drowning in a sea of bits

But data volume is still increasing! 2016: 1.1 ZB of global IP traffic per year (>1 billion GB/month) 2019: 2 ZB[6]

2011: 1.8 ZB of information created 2012: 2.8 ZB 2020: 40 ZB[7]

Page 9: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

The Data Explosion

9 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Mo’ data, mo’ problems.

Page 10: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

TRUNCATE is also a four-letter word. (So is DROP…) The Data Explosion

10 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

What to do? •  Delete

•  Some organizations afraid to delete anything •  Creation velocity still a problem

•  Collect less? •  Pray to the storage gods? •  Panic! •  Spend the money, buy more storage

•  May be inevitable •  ROI and efficiency still matter

Page 11: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Trading CPU cycles for disk space since 2015 The Data Explosion

11 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Compression to the rescue! •  Well, sort of.

•  Workload matters. •  Structure of data matters.

•  Decrease velocity of data growth •  Thank you, Gordon Moore!

Page 12: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Compressed pins are compressed. The Data Explosion

12 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Pinterest, 12 months ago: •  Lots of data stored as JSON blobs •  Workload is read-heavy, but not overall QPS-heavy •  No compression being used •  i2.4xlarge for DB servers (3TB of disk) •  Estimated disk space exhaustion around EOQ1 2016

•  More servers? •  Bigger servers? •  Panic?

Page 13: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Compressed pins are compressed. The Data Explosion

13 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Pinterest, today: •  Pin data still stored as JSON blobs •  i2.4xlarge for DB servers (3TB of disk) •  Workload profile hasn’t changed much •  InnoDB page compression being used

•  Approximately 50% space reduction •  Reduction in data growth velocity •  Disk space exhaustion estimated Q2 2017

•  Still looking for ways to do more with our existing resources

Page 14: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Entropy is more than just the heat death of the universe. Stand Back, I’m Going To Math

14 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Entropy: A mathematical measure of information or uncertainty. •  Computed as a function of a probability distribution. •  Claude Shannon (1948): A Mathematical Theory of Communication

More formally: Suppose X is a discrete random variable which takes on values from a finite set X. Then, then entropy of the random variable X is defined to be:

H (X) = − P(x)logx∈X∑ 2P(x)

Page 15: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Encoding to binary strings for fun and profit Stand Back, I’m Going To Math

15 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

An encoding is a function that maps elements from the set X to the set of finite binary strings.

f : X→ {0,1}*

Extend this to finite sequences (strings) of elements: f (x1x2x3...xk ) = f (x1) || f (x2 ) || f (x3) || ... || f (xk )

f : X*→ {0,1}*

where || is the concatenation operator So, we can really think of the encoding like this:

For a given set X, there are infinitely many encodings. Why?

Page 16: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

But not just any encoding will do. Stand Back, I’m Going To Math

16 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

•  Injective •  Guarantees an unambiguous decoding

•  Prefix-free •  Allows sequential decoding, no memory required •  An encoding is prefix-free if there do not exist elements x, y in X and a string S in {0,1}*

such that f(x) = f(y) || S •  Lossless

•  Informally, exactly what it sounds like – given an encoded string E, we can decode it back precisely into the original string S

•  Efficient! •  Use as few bits as possible to encode each string. •  How low can we go?

Page 17: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

A little theory before some practice. Stand Back, I’m Going To Math

17 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

One more definition. Suppose that we have a string such that each in the string occurs according to a specified probability distribution. The probability of any such string (note that the elements of the string do not need to be distinct) is given by:

x1!xk xi

P(x1!xk ) = P(xi )i=1

k

∏This is just basic probability. Consider a fair coin that gets flipped twice. Possible outcomes are: HH, HT, TH, TT

Page 18: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

CAT BREAK! Stand Back, I’m Going To Math

18 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 19: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Efficiency cat likes short strings Stand Back, I’m Going To Math

19 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

The efficiency of a particular encoding f is defined as the weighted average length of an encoding of an element of X.

ℓ( f ) = P(x)x∈X∑ f (x)

Where |y| denotes the length of string y.

Page 20: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Putting it all together Stand Back, I’m Going To Math

20 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Source Coding Theorem (informally stated): A string S of length N consisting of elements of X and probability distribution X that has entropy H(X) can be compressed into more than N*H(X) bits with negligible risk of data loss as N à ∞, but it cannot be compressed into fewer than N*H(X) bits without virtually guaranteeing data loss.

H (X) ≤ ℓ( f )< H (X)+1

What does this mean? It provides a bound on encoding efficiency for lossless compression algorithms.

Proof is left as an exercise to the reader. But you can use Huffman coding to actually find an efficient code that satisfies the above.

Page 21: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Looking at things differently Stand Back, I’m Going To Math

21 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

It’s not possible to have an average information content of more than one bit per bit of message without losing data. On average, English text has roughly one bit of entropy per letter.[8] ASCII is an 8-bit encoding. It should come as no surprise that English text compresses quite well.

Page 22: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

The last slide on theory, I promise Stand Back, I’m Going To Math

22 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

We don’t necessarily have to think of individual letters. -  Bigrams, trigrams -  Words or tokens (think about SQL keywords or a JSON document) Some strings come out smaller when compressed. Some come out larger. There’s no universal encoding that works equally-well for every set of source strings.

Page 23: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

•  “Old” compression technology •  Application layer •  SQL functions: COMPRESS() / DECOMPRESS() •  ARCHIVE storage engine •  InnoDB page compression

•  “New” compression technology •  TokuDB •  MyRocks •  MySQL 5.7 “punch hole” transparent compression •  Server-level column compression… what?!

So Many Options, So Little CPU!

23 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Compression sounds great! I want some for my database, too.

Page 24: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Don’t Try This At Home

24 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Just because you can do something doesn’t mean you should.

Application-Level Compression The Good: •  Not limited in choice of algorithm •  Scales horizontally with app servers •  Minimizes network traffic •  Works with any storage engine •  Fine-grained control over what to

compress and what to leave alone

The Bad: •  Might require a lot of code retrofit •  Significant operational overhead in the

event of incidents •  Potentially-significant loss of SQL

functionality •  WHERE clauses on compressed data •  SQL functions

Page 25: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Unless you’re Batman. Then be Batman. Don’t Try This At Home

25 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

When might you consider it? •  New projects, maybe •  Existing projects, maybe not •  The data to be compressed doesn’t need anything more than store/retrieve •  You’re OK with the output of ‘SHOW PROCESSLIST’ screwing up your terminal •  Network bandwidth is at a premium but CPU is plentiful (MySQL on Mars?)

Page 26: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Don’t Try This At Home

26 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

You’re not Batman.

SQL Function Compression (COMPRESS/DECOMPRESS) The Good: •  Works with any storage engine •  Fine-grained control over what to

compress and what to leave alone

The Bad: •  All of the same negatives of

application-level compression but without any of the major benefits.

•  Extra load on the MySQL server

When might you consider it? •  For any serious project, probably never

Page 27: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Don’t Try This At Home

27 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Included for the sake of completeness only

ARCHIVE Storage Engine The Good: •  Convenient •  Mature

The Bad: •  No UPDATE or DELETE •  SELECT is a table scan •  Not a usable general-purpose engine

When might you consider it? •  Data that never needs to be updated and is rarely accessed •  Data that can be lost or regenerated in an emergency

Page 28: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Honey, I shrunk the database!

Not Your Grandfather’s GZIP

28 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

InnoDB Page Compression (pre-5.7) The Good: •  Mature •  No need to retrofit code •  Decent compression ratio •  Reasonably performant for many things

The Bad: •  Memory inefficient •  Not as space-efficient as it could be •  Not much configurability

When might you consider it? •  Read-mostly workloads of low to moderate concurrency •  For many users, it’s still the only game in town

Page 29: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Eh.

Not Your Grandfather’s GZIP

29 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

InnoDB Punch-Hole Compression (5.7+) The Good: •  Configurable choice of algorithm •  No need to retrofit code •  No more buffer pool inefficiency

The Bad: •  Immature •  Crashed my test server •  FS fragmentation •  Doesn’t seem to play well with XFS

When might you consider it? •  Maybe 5.8, but that’s just my opinion. •  Maybe if you’re using FusionIO NVMFS

Page 30: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Hole-punching revisited (or, how I learned to stop worrying and love deadlocks) Not Your Grandfather’s GZIP

30 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

InnoDB Punch-Hole Compression (5.7+) continued.

Lots of this in dmesg: [203516.812112] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)

CPUs reporting nontrivial IO wait and nothing else: 05:54:38 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 05:54:39 PM all 0.31 0.00 0.00 6.20 0.00 0.00 0.00 0.00 93.49 05:54:39 PM 0 1.00 0.00 0.00 13.00 0.00 0.00 0.00 0.00 86.00 05:54:39 PM 1 1.00 0.00 0.00 12.00 0.00 0.00 0.00 0.00 87.00 05:54:39 PM 2 0.00 0.00 0.00 12.00 0.00 0.00 0.00 0.00 88.00 05:54:39 PM 3 0.00 0.00 0.00 10.00 0.00 0.00 0.00 0.00 90.00 05:54:39 PM 4 1.00 0.00 0.00 12.00 0.00 0.00 0.00 0.00 87.00 05:54:39 PM 5 0.00 0.00 0.00 13.13 0.00 0.00 0.00 0.00 86.87 05:54:39 PM 6 3.00 0.00 1.00 11.00 0.00 0.00 0.00 0.00 85.00 05:54:39 PM 7 0.00 0.00 0.00 14.14 0.00 0.00 0.00 0.00 85.86 05:54:39 PM 8 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00 05:54:39 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:54:39 PM 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:54:39 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:54:39 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:54:39 PM 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:54:39 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:54:39 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

Page 31: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

What does Tokutek mean, anyway?

Not Your Grandfather’s GZIP

31 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

TokuDB The Good: •  Fully transactional •  Very good compression ratio •  Optimized for high write volume •  Code changes not likely needed

The Bad: •  Reads can be slower than InnoDB •  MySQL’s datadir becomes a mess •  Some InnoDB constructs unsupported •  Limited MySQL community knowledge

When might you consider it? •  Lower-end storage technology (slow SSD vs. Flash) •  Data that can benefit from multiple clustering indexes (time series data, perhaps) •  Dedicated server (no InnoDB)

Page 32: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Get your rocks on!

Not Your Grandfather’s GZIP

32 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

RocksDB (MyRocks) The Good: •  Fully transactional •  Good compression ratio •  Optimized for high write volume •  Generally very fast •  Low write amplification

The Bad: •  Not GA yet. •  Currently only available as part of

Facebook MySQL 5.6 •  Some InnoDB constructs unsupported •  Locking behavior different from InnoDB

When might you consider it? •  Need high compression ratio •  Concerned about SSD burnout •  Becomes available separately from FB-MySQL

Page 33: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Hey, I didn’t see THAT in the manual

Not Your Grandfather’s GZIP

33 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

InnoDB Column Compression The Good: •  Configurable compression dictionary •  Very good compression ratio possible •  Excellent performance under load •  Very memory-efficient

The Bad: •  Not yet released to the public (not GA)

When should you consider it? •  Storage of a lot of JSON, XML, or other compressible BLOB data •  After it becomes GA

Page 34: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

But first… A CAT. Ooh, Shiny Numbers!

34 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 35: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

There are so many of them Ooh, Shiny Numbers!

35 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Recall that we’ve already gone from uncompressed to InnoDB page compression •  Performance is good •  We think we can do better on disk space efficiency However… •  Not going to engage in massive code rewrite •  ARCHIVE engine isn’t relevant to us •  MyRocks isn’t yet in a state where we’d spend significant time on it So… •  Page compression •  Column compression without dictionary •  Column compression with dictionary of various sizes •  TokuDB •  Punch-hole (or not...)

Page 36: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Servers, start your engines Ooh, Shiny Numbers!

36 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Choose a typical ‘pins’ shard, of which there are thousands. Call it N. •  Shard N contains about 20GB of raw, uncompressed data •  InnoDB page compression brings this down to around 10GB

•  Up to 20% fragmentation overhead •  Run ‘OPTIMIZE TABLE’ and we go down to 8.4GB – this is our starting point

•  Set up several test servers with various compression configurations

Server A: page compressed – the control Server B: column compression, no dictionary Server C: column compression, one pin dictionary Server D: column compression, four pin dictionary Server E: column compression, eight pin dictionary Server F: column compression, 32K dictionary Server G: TokuDB, default settings

Page 37: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

They don’t lie. And 65% of all statistics are made up. Ooh, Shiny Numbers!

37 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Server A Server B Server C Server D Server E Server F Server G

Size (GB) 8.4 8.2 5.4 5.4 5.4 5.2 3.6

dump rate (rows/sec)

52.2K 33.3K 34.3K 32.4K 30.6K 25K 53.5K

replication 1 2:40 2:52 2:35 2:57 2:47 3:00 6:36

replication 16 0:19 0:19 0:21 0:19 0:19 0:22 1:46

RO QPS 16 35K 40K-50K 40K-50K 40K-50K 40K-50K 40K-50K 20K

P99.9999 10ms 10ms 10ms 10ms 10ms 10ms 40ms

RW QPS 16 25K-30K 30K-40K 30K-40K 30K-40K 30K-40K 30K-40K 18K

P99.9999 30ms 25ms 25ms 25ms 25ms 25ms 40ms

Page 38: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Replication resync rate, single thread Ooh, Shiny Numbers!

38 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 39: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Replication resync rate, 16-thread MTS Ooh, Shiny Numbers!

39 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 40: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Interpreting the images on the pages to come

For the graphs on the next several slides: •  Server A (page compression) is RED

•  Server B (column compression, no dictionary) is LIGHT GREEN

•  Server C (column compression, one pin) is BLUE

•  Server D (column compression, four pins) is LIGHT BLUE

•  Server E (column compression, eight pins) is DARK RED

•  Server F (column compression, 32K of pins) is PURPLE

•  Server G (TokuDB) is GOLD/YELLOW

A Key to the Graphics Kingdom

Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 40

Page 41: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

SELECT 256, 128, 32, 16, 8, 4, 1 threads(pquery) Ooh, Shiny Numbers

41 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 42: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

p99.9 Read Performance (Log Scale y-axis) Ooh, Shiny Numbers

42 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 43: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Read performance for ALL the 9s! (p99.9999) Ooh, Shiny Numbers

43 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 44: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Read/write QPS for 16, 8, 4, 1, 32, 64, 128 threads Ooh, Shiny Numbers

44 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 45: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

P99.9 write performance for the previous graph (log10 scale) Ooh, Shiny Numbers

45 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 46: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

P99.9999 overall performance for the previous QPS (r/w) graph (log10 scale) Ooh, Shiny Numbers

46 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

Page 47: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

What’d we get out of this? Summary Results

47 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

•  Even with just the simplest predefined dictionary – a single pin – thus capturing all of the JSON field names - we get dramatically improved space efficiency. With a better dictionary, we can likely do even better, and at our scale, a few percent can be a nontrivial improvement.

•  At low concurrency (running threads <= number of cores), there isn’t too much difference between column compression and page compression when it comes to performance.

•  At higher concurrency (number of running threads > number of cores in the machine), page compression falls over pretty badly on the read-only test. Column compression continues working quite well up to 256 active threads and perhaps even higher.

•  TokuDB wins on compression easily, but otherwise doesn’t do that well for our workload in a default configuration (and with all the other tables on the server still InnoDB).

•  Column compression looks like a serious winner, at least for what we need. I don’t think we’ll be the only ones.

Page 48: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

Credit where credit is due. Notes & References

48 Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets– Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016

[1] http://techcrunch.com/2010/08/04/schmidt-data/

[2] http://nymag.com/scienceofus/2015/06/heres-a-study-about-internet-cats.html

[3] https://www.domo.com/blog/2012/06/how-much-data-is-created-every-minute/

[4] https://www.domo.com/blog/2014/04/data-never-sleeps-2-0/

[5] http://www.mkomo.com/cost-per-gigabyte-update

[6] http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-481360.html

[7] http://www.webopedia.com/quick_ref/just-how-much-data-is-out-there.html

[8] http://people.seas.harvard.edu/~jones/cscie129/papers/stanford_info_paper/entropy_of_english_9.htm

Page 49: Less Is More: Novel Approaches to MySQL Compression for Modern Data Sets - Percona Live 2016

49

Questions? Answers! email: [email protected] | twitter: @denshikarasu | pinterest engineering blog: https://engineering.pinterest.com

We are hiring! https://careers.pinterest.com