consistent hashing in your python applications · python dict() implementation hash(key) &...

39
Consistent Hashing in your python applications Europython 2017

Upload: others

Post on 08-Jul-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Consistent Hashingin your python applications

Europython 2017

Page 2: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

@ultrabugGentoo Linux developerCTO at Numberly

Page 3: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

History & main use casesDistributed (web) caching (Akamai)

P2P (Chord & BitTorrent)

Distributed databases (data distribution / sharding)

● Amazon DynamoDB● Cassandra / ScyllaDB● Riak● CockroachDB

Page 4: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

MAPPINGreferential -> information

Page 5: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Phonebookname -> phone number

Page 6: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Referential selection Logical operation INFORMATION

lookup efficiency

Map logic

Page 7: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

MAPkey -> value

Page 8: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Python dict(){key: value}

Page 9: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Python dict() is a Hash Table

Page 10: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Hash function ( key ) Logical operation LOCATION

Hash Table logic

implementation

Page 11: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Python dict() implementationhash(key) & (size of array - 1) = array index

hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123

hash(‘b’) = 12544037731 & 11 = 3

1 |

hash(‘c’) = 12672038114 & 11 = 2 2 | value: ‘coco’

3 | value: None

11 |

...

Array (in memory)

Page 12: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Distribution (balancing) Accuracy LOCATION

efficiency

scaling

Key factors to consider

Page 13: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Python dict efficiency & scalinghash(key) & (size of array - 1) = array index

hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123

hash(‘b’) = 12544037731 & 11 = 3

1 | MEMORY

hash(‘c’) = 12672038114 & 11 = 2 2 | value: ‘coco’

3 | value: None

11 | MEMORY

...

hash() = uneven distribution

Optimized for fast lookups O(1)Memory inefficient (probing)

Page 14: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Distributed Hash Tables(DHT)

Page 15: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Split your key space into buckets

bucket h o v

bucket h o v

bucket h o v

the hash function will impact the size of each bucket

hash(key) operator

hash(key) operator

hash(key) operator

Page 16: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Distribute your buckets to servers

hash(key) operator SERVER 0 bucket 0

hash(key) operator SERVER 1 bucket 1

hash(key) operator SERVER 2 bucket 2

what’s the best operator function to find the server hosting the bucket for my key ?

Page 17: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

md5(key) % (number of buckets) = server

Naive DHT implementation

int(md5(b'd').hexdigest(), 16) % 3 = 0 SERVER 0 bucket 0

% 3 = 1 SERVER 1 bucket 1

% 3 = 2 SERVER 2 bucket 2

simple & looking good...

int(md5(b'e').hexdigest(), 16)

int(md5(b'f').hexdigest(), 16)

Page 18: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

md5(key) % (number of buckets) = server

Naive DHT implementation

int(md5(b'd').hexdigest(), 16) % 4 = 1 (was 0) SERVER 0 bucket 0

% 4 = 2 (was 1) SERVER 1 bucket 1

% 4 = 3 (was 2) SERVER 2 bucket 2

...until you add / remove a bucket/server!

int(md5(b'e').hexdigest(), 16)

int(md5(b'f').hexdigest(), 16)

% 4 = 1

SERVER 3 bucket 3

int(md5(b'g').hexdigest(), 16) SERVER 1 bucket 1

Page 19: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

n/(n+1)~ fraction of remapped keys

Page 20: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

HELP!we need consistency

Page 21: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

The Hash Ring

Page 22: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

hash(server 1)

Place your servers on the continuum (ring)

hash(server 2)

hash(server 0)

Page 23: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Keys’ bucket is on the next server in the ring

SERVER 1

SERVER 2

SERVER 0

hash(key)

hash(key)

Page 24: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

1/n~ fraction of remapped keys

Page 25: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Uneven partitions lead to hotspots

server 0

server 2

server 1hash functions are not perfect

Page 26: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Which hash function to use ?Cryptographic hash functions

● MD5● SHA1● SHA256

Non cryptographic hash functions

● CityHash (google)● Murmur (v3)

standard optimized for key lookups

adoption fast

need of C libsneed conversion to int

SHAX - MD5 - CityHash128 - Murmur3 - CityHash64 - CityHash32speed

Page 27: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Hash Rings vnodes & weights mitigate hotspots

reduces load variance on servers

Page 28: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

My preciouuus!

Page 29: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Consistent Hashing implementations in python

ConsistentHashing

consistent_hash

hash_ring

python-continuum

uhashring

A simple implement of consistent hashing

The algorithm is the same as libketama

Using md5 as hashing function

Using md5 as hashing function

Full featured, ketama compatible

Page 30: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

uhashring

Page 31: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Example use case #1Database instances distribution

DB1

DB2

DB3

DB4

client A

client B

client C

client D

Page 32: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Example use case #1Database instances distribution

Page 33: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Example use case #1Database instances distribution

Page 34: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Example use case #2Disk & network I/O distribution

disk1

disk2

disk3

disk4

task A

task B

task C

task D

Page 35: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Example use case #3Log & tracing consistency

worker1

worker2

worker3

worker4

user_id A

user_id B

user_id C

user_id D

Page 36: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Example use case #4python-memcached consolidation

cache1

cache2

cache3

cache4

‘potato’

‘coconut’

‘tomato’

‘raspberry’

Page 37: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Live demo raffle

List of GIFsOne of the GIF is the winnerEvery participant is a node (bucket)hash(WINNER_GIF_URL) picks the winner node

Page 38: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

http://ep17.nbly.co

(silly live demo)

Page 39: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)

Thanksgithub.com/ultrabug/ep2017github.com/ultrabug/uhashring@ultrabug