logical sharding at etsy

52
“Just” shard it Logical sharding at Etsy Maggie Zhou @zmagg ScaleConf 2015

Upload: zmagg

Post on 16-Jul-2015

435 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Logical sharding at Etsy

“Just” shard it Logical sharding at Etsy

Maggie Zhou @zmagg

ScaleConf 2015

Page 2: Logical sharding at Etsy
Page 3: Logical sharding at Etsy
Page 4: Logical sharding at Etsy

@zmagg

What’s the infrastructure?

Page 5: Logical sharding at Etsy

@zmagg

L A M P

Page 6: Logical sharding at Etsy

@zmagg

L A M P

Page 7: Logical sharding at Etsy

@zmagg

Page 8: Logical sharding at Etsy

@zmagg

2019

http://surge.omniti.com/2011/speakers/ross-snyder

Page 9: Logical sharding at Etsy

@zmagg

L A M P

yay, databases!

Page 10: Logical sharding at Etsy

@zmagg

tickets index

shard 1 shard 2 shard 3 shard n

Page 11: Logical sharding at Etsy

@zmagg

??

Page 12: Logical sharding at Etsy

@zmagg

master-master replication

Page 13: Logical sharding at Etsy

@zmagg

tickets index

shard 1 shard 2 shard 3 shard n

ORM

Page 14: Logical sharding at Etsy

@zmagg

tickets index

shard 1 shard 2 shard 3 shard n

shard n+1

Page 15: Logical sharding at Etsy

@zmagg

Capacity planning included setting aside

2 months for load balancing

Page 16: Logical sharding at Etsy

@zmagg

2 months??

Page 17: Logical sharding at Etsy

@zmagg

2010’s solution is Not Scaling

Page 18: Logical sharding at Etsy

@zmagg

shop_1 shop_2 shop_3 shop_4

shop_2

shard N shard N+1

index

shop_2

…writes locked

1)

2)

3) update index, remove lock

Page 19: Logical sharding at Etsy

@zmagg

Migrations were• error prone

• arbitrary

• developers had to be aware

• created orphaned data

• locked shops & users out of changes for up to hours

• slow

Page 20: Logical sharding at Etsy

@zmagg

Migrations wereError prone? We can fix the errors! We can make the script more robust!

Arbitrary? We can write tooling that figures out which rows are right to migrate off for optimal balance!

Developers had to be aware? We can write better interfaces!

Page 21: Logical sharding at Etsy

@zmagg

Orphaned data?• Deletes are expensive, so we didn’t do them.

• Migrations created orphaned data on old hosts that were still picked up by full table scans (downstream systems: search, analytics).

!

Page 22: Logical sharding at Etsy

@zmagg

Migrations were• error prone

• arbitrary

• developers had to be aware

• created orphaned data

• locked shops & users out of changes for up to hours

• slow

Page 23: Logical sharding at Etsy

@zmagg

Let’s talk about slowness…

What if we could move more than one row at a time?

Page 24: Logical sharding at Etsy

@zmagg

<<enter logical sharding>>

Page 25: Logical sharding at Etsy

@zmagg

Well, okay, why didn’t we do this in the first

place?

Page 26: Logical sharding at Etsy

@zmagg

You have to run your site to learn your data

access patterns.

Page 27: Logical sharding at Etsy

@zmagg

photo from gizmodo http://gizmodo.com/5632095/justin-bieber-has-dedicated-servers-at-twitter

Page 28: Logical sharding at Etsy

@zmagg

listing from https://www.etsy.com/shop/NausicaaDistribution

Page 29: Logical sharding at Etsy

@zmagg

listing from https://www.etsy.com/shop/Mugnificentart

Page 30: Logical sharding at Etsy

@zmagg

Scaling Pinterest, 2012 “to increase capacity, a server is replicated and the new replica is responsible for some DBs”

slide 32 https://speakerdeck.com/yashh/scaling-pinterest

Sharding & IDs at Instagram, 2012 “simply by moving a set of logical shards from one database to another"

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

Page 31: Logical sharding at Etsy

@zmagg

shard1 shard2

… shard 10

db_host1 db_host2 db_host3 db_hostN

shard1 shard2

… shard10

shard11 shard12

… shard20

shard21 shard22

… shard30

!

!

Page 32: Logical sharding at Etsy

@zmagg

db_host1

shard1 shard2

… shard10

db_host2

shard2

Page 33: Logical sharding at Etsy

@zmagg

Page 34: Logical sharding at Etsy
Page 35: Logical sharding at Etsy
Page 36: Logical sharding at Etsy
Page 37: Logical sharding at Etsy

How did we move onto this new architecture?

Page 38: Logical sharding at Etsy

@zmagg

We used the old row-based migration framework, one last time.

Page 39: Logical sharding at Etsy

We migrated all the sharded data (120TB) without

downtime or developer disruption.

Page 40: Logical sharding at Etsy

@zmagg

Page 41: Logical sharding at Etsy

@zmagg

It took us 5 months to move that data using

the old way.

Page 42: Logical sharding at Etsy

@zmagg

Today it would take us a few hours.

Page 43: Logical sharding at Etsy

@zmagg

What’d we build?• made tooling logical shard aware

• generated database configs so that we could have two-way O(1) mappings between logical shards and physical hosts.

Page 44: Logical sharding at Etsy

@zmagg

DNS related outage…

Page 45: Logical sharding at Etsy

@zmagg

Nice side effects!• schema changes (alters) now run in parallel,

significantly faster!

• no more orphaned data!

• downstream analytics systems replicate faster at the shard-by-shard level!

Page 46: Logical sharding at Etsy

@zmagg

The Future

Page 47: Logical sharding at Etsy

Do you have any limits to how many shard hosts you can add?

Page 48: Logical sharding at Etsy

@zmagg

Well, yes…It’s 999…because of technical debt.

Code base riddled with checks like this:

!

!

(We can support ~16x data growth, on current day hardware)

Page 49: Logical sharding at Etsy

@zmagg

2019

http://surge.omniti.com/2011/speakers/ross-snyder

Page 50: Logical sharding at Etsy

@zmagg

Resources• Using a tickets database for ID generation: http://code.flickr.net/2010/02/08/ticket-servers-distributed-

unique-primary-keys-on-the-cheap/

• Using master-master replication: https://codeascraft.com/2012/04/20/two-sides-for-salvation/

• Etsy’s shard architecture, the 2012 edition: http://www.percona.com/live/mysql-conference-2012/sessions/etsy-shard-architecture-starts-s-and-ends-hard

• Scaling Etsy, 2011: http://surge.omniti.com/2011/speakers/ross-snyder

• Instagram’s shard & id architecture: http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

• Scaling Pinterest: https://speakerdeck.com/yashh/scaling-pinterest

• Morgue, Etsy’s postmortem tool https://github.com/etsy/morgue

Page 51: Logical sharding at Etsy

Questions?

Page 52: Logical sharding at Etsy

“Just” shard it Logical sharding at Etsy

Maggie Zhou @zmagg

ScaleConf 2015