logical sharding at etsy

Post on 16-Jul-2015

435 Views

Category:

Software

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

“Just” shard it Logical sharding at Etsy

Maggie Zhou @zmagg

ScaleConf 2015

@zmagg

What’s the infrastructure?

@zmagg

L A M P

@zmagg

L A M P

@zmagg

@zmagg

2019

http://surge.omniti.com/2011/speakers/ross-snyder

@zmagg

L A M P

yay, databases!

@zmagg

tickets index

shard 1 shard 2 shard 3 shard n

@zmagg

??

@zmagg

master-master replication

@zmagg

tickets index

shard 1 shard 2 shard 3 shard n

ORM

@zmagg

tickets index

shard 1 shard 2 shard 3 shard n

shard n+1

@zmagg

Capacity planning included setting aside

2 months for load balancing

@zmagg

2 months??

@zmagg

2010’s solution is Not Scaling

@zmagg

shop_1 shop_2 shop_3 shop_4

shop_2

shard N shard N+1

index

shop_2

…writes locked

1)

2)

3) update index, remove lock

@zmagg

Migrations were• error prone

• arbitrary

• developers had to be aware

• created orphaned data

• locked shops & users out of changes for up to hours

• slow

@zmagg

Migrations wereError prone? We can fix the errors! We can make the script more robust!

Arbitrary? We can write tooling that figures out which rows are right to migrate off for optimal balance!

Developers had to be aware? We can write better interfaces!

@zmagg

Orphaned data?• Deletes are expensive, so we didn’t do them.

• Migrations created orphaned data on old hosts that were still picked up by full table scans (downstream systems: search, analytics).

!

@zmagg

Migrations were• error prone

• arbitrary

• developers had to be aware

• created orphaned data

• locked shops & users out of changes for up to hours

• slow

@zmagg

Let’s talk about slowness…

What if we could move more than one row at a time?

@zmagg

<<enter logical sharding>>

@zmagg

Well, okay, why didn’t we do this in the first

place?

@zmagg

You have to run your site to learn your data

access patterns.

@zmagg

photo from gizmodo http://gizmodo.com/5632095/justin-bieber-has-dedicated-servers-at-twitter

@zmagg

listing from https://www.etsy.com/shop/NausicaaDistribution

@zmagg

listing from https://www.etsy.com/shop/Mugnificentart

@zmagg

Scaling Pinterest, 2012 “to increase capacity, a server is replicated and the new replica is responsible for some DBs”

slide 32 https://speakerdeck.com/yashh/scaling-pinterest

Sharding & IDs at Instagram, 2012 “simply by moving a set of logical shards from one database to another"

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

@zmagg

shard1 shard2

… shard 10

db_host1 db_host2 db_host3 db_hostN

shard1 shard2

… shard10

shard11 shard12

… shard20

shard21 shard22

… shard30

!

!

@zmagg

db_host1

shard1 shard2

… shard10

db_host2

shard2

@zmagg

How did we move onto this new architecture?

@zmagg

We used the old row-based migration framework, one last time.

We migrated all the sharded data (120TB) without

downtime or developer disruption.

@zmagg

@zmagg

It took us 5 months to move that data using

the old way.

@zmagg

Today it would take us a few hours.

@zmagg

What’d we build?• made tooling logical shard aware

• generated database configs so that we could have two-way O(1) mappings between logical shards and physical hosts.

@zmagg

DNS related outage…

@zmagg

Nice side effects!• schema changes (alters) now run in parallel,

significantly faster!

• no more orphaned data!

• downstream analytics systems replicate faster at the shard-by-shard level!

@zmagg

The Future

Do you have any limits to how many shard hosts you can add?

@zmagg

Well, yes…It’s 999…because of technical debt.

Code base riddled with checks like this:

!

!

(We can support ~16x data growth, on current day hardware)

@zmagg

2019

http://surge.omniti.com/2011/speakers/ross-snyder

@zmagg

Resources• Using a tickets database for ID generation: http://code.flickr.net/2010/02/08/ticket-servers-distributed-

unique-primary-keys-on-the-cheap/

• Using master-master replication: https://codeascraft.com/2012/04/20/two-sides-for-salvation/

• Etsy’s shard architecture, the 2012 edition: http://www.percona.com/live/mysql-conference-2012/sessions/etsy-shard-architecture-starts-s-and-ends-hard

• Scaling Etsy, 2011: http://surge.omniti.com/2011/speakers/ross-snyder

• Instagram’s shard & id architecture: http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

• Scaling Pinterest: https://speakerdeck.com/yashh/scaling-pinterest

• Morgue, Etsy’s postmortem tool https://github.com/etsy/morgue

Questions?

“Just” shard it Logical sharding at Etsy

Maggie Zhou @zmagg

ScaleConf 2015

top related