20141206 4 q14_dataconference_i_am_your_db

42
I’m your DB ( I need a database that scales ) FB / hyeongchae.lee 4Q14 DataConference.IO 1

Upload: hyeongchae-lee

Post on 12-Jul-2015

2.080 views

Category:

Software


0 download

TRANSCRIPT

Page 1: 20141206 4 q14_dataconference_i_am_your_db

I’m your DB( I need a database that scales )

FB/hyeongchae.lee

4Q14 DataConference.IO 1

Page 2: 20141206 4 q14_dataconference_i_am_your_db

4Q14 DataConference.IO 2

I’m your DB!May the oracle be with you

Page 3: 20141206 4 q14_dataconference_i_am_your_db

Agenda

• About me

• DBMS vs NoSQL

• Local vs Global

• So... which databases scale?

• Amazon Aurora

4Q14 DataConference.IO 3

Page 4: 20141206 4 q14_dataconference_i_am_your_db

ABOUT ME

----------------------------

4Q14 DataConference.IO 4

Page 5: 20141206 4 q14_dataconference_i_am_your_db

4Q14 DataConference.IO 5

INERVITMobileLite

nhnCUBRID

TELCOWARE

Telcobase

ALTIBASEAltibase

TIBEROTibero

Page 6: 20141206 4 q14_dataconference_i_am_your_db

4Q14 DataConference.IO 6

Page 7: 20141206 4 q14_dataconference_i_am_your_db

Global Open Frontier Full-time

• Project : MySQL Redis Plug-in ( +MariaDB, +MaxScale )

– https://github.com/sql2/MySQL_Redis_Plugin_Dev

4Q14 DataConference.IO 7

Page 8: 20141206 4 q14_dataconference_i_am_your_db

MySQL Memcached Plug-in

4Q14 DataConference.IO 8

Mysqld

MySQL Server

Handler API

Memcached plugin

innodb_memcache

local cache(optional)

InnoDB API

InnoDB Storage Engine

SQL Memcached protocol

Application

Page 9: 20141206 4 q14_dataconference_i_am_your_db

MySQL Redis Plug-in

4Q14 DataConference.IO 9

Mysqld

MySQL Server

Handler API

Redis plugin

innodb_redislocal cache(optional)

InnoDB API

InnoDB Storage Engine

SQL Redis protocol

Application

Page 10: 20141206 4 q14_dataconference_i_am_your_db

2015 : MaxScale Redis Cluster Plug-in

4Q14 DataConference.IO 10

URL : https://mariadb.com/blog/maxscale-proxy-mysql-replication-relay

Page 11: 20141206 4 q14_dataconference_i_am_your_db

DBMS VS NoSQL

4Q14 DataConference.IO 11

Page 12: 20141206 4 q14_dataconference_i_am_your_db

Rank Last Month DBMS Database Model Score Changes

1 1 Oracle Relational DBMS 1452.13 -19.77

2 2 MySQL Relational DBMS 1279.08 +16.11

3 3 Microsoft SQL Server Relational DBMS 1220.20 +0.59

4 4 PostgreSQL Relational DBMS 257.36 -0.36

5 5 MongoDB Document store 244.73 +4.33

6 6 DB2 Relational DBMS 206.23 -1.44

7 7 Microsoft Access Relational DBMS 138.84 -2.80

8 8 SQLite Relational DBMS 95.28 +0.33

9 10 Cassandra Wide column store 91.99 +6.29

10 9 Sybase ASE Relational DBMS 84.62 -2.17

DB-Engines Ranking

4Q14 DataConference.IO 12

2014.11.24

http://db-engines.com/en/ranking

Page 13: 20141206 4 q14_dataconference_i_am_your_db

4Q14 DataConference.IO 13

http://db-engines.com/en/ranking_categories

Page 14: 20141206 4 q14_dataconference_i_am_your_db

Winner !!

4Q14 DataConference.IO 14

Page 15: 20141206 4 q14_dataconference_i_am_your_db

Magic Quadrant for Operational Database Management Systems

4Q14 DataConference.IO 15

1 Oracle's Letter to the EU Concerning MySQL

After an antitrust investigation, the European Commission approved Oracle's acquisition of Sun Microsystems, including MySQL, on 21 January 2010.

Wikileaks subsequently published cables indicating that the Obama administration applied pressure to the EU to approve the deal.

Concerns about the MySQL acquisition had been addressed in Oracle's 14 December 2009 pledges to customers, which were to extend for five years — thus expiring in early 2015.

Oracle's pledges included commitments to maintain certain APIs, extensions of licenses to then-current licensees, continued use of GPL licensing, and others. The expiration of these commitments may change the nature of Oracle's relationships with a number of hardware and software vendors, as well as its posture regarding product investment, support for purchasing requirements, and other aspects of MySQL's business model.

Page 16: 20141206 4 q14_dataconference_i_am_your_db

LOCAL VS GLOBAL

4Q14 DataConference.IO 16

Page 17: 20141206 4 q14_dataconference_i_am_your_db

Korean vs Japan

50M vs 127M

4Q14 DataConference.IO 17

Page 18: 20141206 4 q14_dataconference_i_am_your_db

Korea vs Japan

4Q14 DataConference.IO 18

Slave Slave

Master

Slave

Slave Slave

Master

Slave

x3

Page 19: 20141206 4 q14_dataconference_i_am_your_db

KakaoTalk vs LINE

4Q14 DataConference.IO 19

Page 20: 20141206 4 q14_dataconference_i_am_your_db

KakaoTalk vs LINE

4Q14 DataConference.IO 20

Page 21: 20141206 4 q14_dataconference_i_am_your_db

We Love FusionIO !!

4Q14 DataConference.IO 21

• facebook/flashcache

Page 22: 20141206 4 q14_dataconference_i_am_your_db

Dolphinics’ Dolphin Interconnect Solutions

4Q14 DataConference.IO 22

Page 23: 20141206 4 q14_dataconference_i_am_your_db

MEMSCALE

4Q14 DataConference.IO 23

Page 24: 20141206 4 q14_dataconference_i_am_your_db

SO... WHICH DATABASES SCALE?

4Q14 DataConference.IO 24

Page 25: 20141206 4 q14_dataconference_i_am_your_db

Read Caching

• Pros : Read-caching can take over a lot of read operations. If reads make up most of your workload, this will obviously help a lot. Even if you have a heavy write workload, read-caching might be enough to keep you from having to scale-out to handle writes.

• Cons : Read-caching, by nature, involves a memory store. If your data-access patterns are really random, or involve a large percentage of records, you might wind up with a pretty expensive memory foot print. Figuring out the right cache-invalidation for your app can also be really tricky. Many memory stores are pretty basic in terms of functionality — lack of support for transactions & joins can mean that you’ll need multiple process or network round-trips between the app & the cache.

4Q14 DataConference.IO 25

http://spiegela.com/2014/04/28/but-i-need-a-database-that-scales-part-1

Page 26: 20141206 4 q14_dataconference_i_am_your_db

Write Coalescing

• Pros : In short: you can achieve better throughput of incoming writes. With many caching systems, you can also query the data in the cache creating a set of real-time use cases including: event-processing, triggers & real-time analytics.

• Cons : Coalescing writes will inherently mean that your persistence layer is behind your ingestion layer. To take advantage of this technique, you’ll need to consider a lot of questions:– Which data to query: cached, persisted, both?

– Does this data need to be made durable (survives a reboot)? How quickly?

– Are there consistency concerns? Unique indices? Atomic transaction?

4Q14 DataConference.IO 26

Page 27: 20141206 4 q14_dataconference_i_am_your_db

Connection Scaling

• Pros : Connection scaling increases the number of concurrent connections (obviously, I think?) It’s biggest benefit, though, is in reliability, since any cluster node can fail and clients can simply reconnect.

• Cons : Connection Scaling requires shared storage. RAC, for example, typically uses OCFS, a clustered file-system, and SAN storage. The ability to handle more I/O transactions is dependent on scaling up that shared storage tier, which can be very expensive. Connection Scaling also doesn’t help much with capacity or analysis scaling since the data is shared, not spread out across nodes.

4Q14 DataConference.IO 27

Page 28: 20141206 4 q14_dataconference_i_am_your_db

Master-Slave Replication

• Pros : While there’s some setup involved, it’s pretty seamless to your application. There’s still only a single node that has control over the data, so there are no new concerns around consistency. For read-constrained applications, nodes can be added quickly and the architecture remains relatively simple.

• Cons : MSR solves one problem: reader transactions. If you need to scale other aspects, you’re not doing it here. If you need more write throughput, MSR offloads the read transactions from the master, but writes are still limited to a single node. Also, slaves can lag in their updates from the master, if you need absolute consistency between the two, you’ll need to investigate options for synchronous replication which can impact performance of the master node.

4Q14 DataConference.IO 28

Page 29: 20141206 4 q14_dataconference_i_am_your_db

Vertical Partitioning ( aka cluster )

• Pros : Having smaller databases makes indices perform better, and allows you to improve just about any aspect of scaling.

• Cons : If your model requires relationships between most or all of your tables for the basic operations, vertically partitioning may not be a fit. Even when you model fits well into partitions today, having these divisions can impact flexibility of performing joins across models in the future.

4Q14 DataConference.IO 29

Page 30: 20141206 4 q14_dataconference_i_am_your_db

Horizontal Partitioning ( aka shard )

• Pros : This type of partitioning provides scaling for all of the elements of scale, allowing for very large data-sets and very good performance.

• Cons : Sharding can have a lot of drawbacks depending on the implementation. For one thing, the client must be aware of the partition key. When implementing sharding in MySQL, for example, an application will typically infer the partition key, and address the desired partition. Increasing the number of nodes, or changing the key requires an update to the app each time. Other trade-offs like database features are up for grabs too:

– Joins: if my data for two collections is distributed across multiple nodes, when I fetch the data back, I may need to join data across more than one — which is likely to be slower

– Transactions: if I have a transaction that involves two nodes of the cluster, how to I execute them atomic-ly? Do I lock multiple nodes? All of them?

– Bulk commits: If I update records in bulk across multiple nodes, this is really two transactions executed separately.

4Q14 DataConference.IO 30

Page 31: 20141206 4 q14_dataconference_i_am_your_db

So... which databases scale?

• Scale Out Reads• Capacity• Scale Out Analysis• Scale Out Writes• Bulk Commits• Joins• Transactions• Durability• Consistency

4Q14 DataConference.IO 31

Page 32: 20141206 4 q14_dataconference_i_am_your_db

4Q14 DataConference.IO 32

Page 33: 20141206 4 q14_dataconference_i_am_your_db

Scaling Storytime

• http://en.wikipedia.org/wiki/Brad_Fitzpatrick

4Q14 DataConference.IO 33

Page 34: 20141206 4 q14_dataconference_i_am_your_db

One Server

4Q14 DataConference.IO 34

MySQL

Apache

Internet

• Simple:

Page 35: 20141206 4 q14_dataconference_i_am_your_db

Two Server

4Q14 DataConference.IO 35

MySQL

Apache

Internet

• Two SPOF

Page 36: 20141206 4 q14_dataconference_i_am_your_db

• Replication !

Five Server

4Q14 DataConference.IO 36

Master

Apache

InternetApache

Apache

Slaveread

write

replication

Page 37: 20141206 4 q14_dataconference_i_am_your_db

More Server

• Chaos !

4Q14 DataConference.IO 37

Master

Apache

Internet Apache

Slave

Apache

Apache

Apache

Apache Slave

Slave SlaveSlave

Slave

Page 38: 20141206 4 q14_dataconference_i_am_your_db

Cluster vs Shard

Multi-Master

Cluster

Shard

Cluster + Shard4Q14 DataConference.IO 38

Page 39: 20141206 4 q14_dataconference_i_am_your_db

MySQL Recruit

• Big Table ( X )

Small Table ( O )

• Performance ( X )

Scale-up ( O ) Distributed ( O )

• Query Tuning

hard ...

• Clustering & Sharding

mission ...

4Q14 DataConference.IO 39

Page 40: 20141206 4 q14_dataconference_i_am_your_db

AMAZON AURORA

4Q14 DataConference.IO 40

Page 41: 20141206 4 q14_dataconference_i_am_your_db

http://www.theregister.co.uk/2014/11/26/inside_aurora_how_disruptive_is_amazons_mysql_clone/

4Q14 DataConference.IO 41

Page 42: 20141206 4 q14_dataconference_i_am_your_db

OSSCON 4Q14 42