consistent hashing in your python applications · python dict() implementation hash(key) &...
TRANSCRIPT
![Page 1: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/1.jpg)
Consistent Hashingin your python applications
Europython 2017
![Page 2: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/2.jpg)
@ultrabugGentoo Linux developerCTO at Numberly
![Page 3: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/3.jpg)
History & main use casesDistributed (web) caching (Akamai)
P2P (Chord & BitTorrent)
Distributed databases (data distribution / sharding)
● Amazon DynamoDB● Cassandra / ScyllaDB● Riak● CockroachDB
![Page 4: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/4.jpg)
MAPPINGreferential -> information
![Page 5: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/5.jpg)
Phonebookname -> phone number
![Page 6: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/6.jpg)
Referential selection Logical operation INFORMATION
lookup efficiency
Map logic
![Page 7: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/7.jpg)
MAPkey -> value
![Page 8: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/8.jpg)
Python dict(){key: value}
![Page 9: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/9.jpg)
Python dict() is a Hash Table
![Page 10: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/10.jpg)
Hash function ( key ) Logical operation LOCATION
Hash Table logic
implementation
![Page 11: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/11.jpg)
Python dict() implementationhash(key) & (size of array - 1) = array index
hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123
hash(‘b’) = 12544037731 & 11 = 3
1 |
hash(‘c’) = 12672038114 & 11 = 2 2 | value: ‘coco’
3 | value: None
11 |
...
Array (in memory)
![Page 12: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/12.jpg)
Distribution (balancing) Accuracy LOCATION
efficiency
scaling
Key factors to consider
![Page 13: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/13.jpg)
Python dict efficiency & scalinghash(key) & (size of array - 1) = array index
hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123
hash(‘b’) = 12544037731 & 11 = 3
1 | MEMORY
hash(‘c’) = 12672038114 & 11 = 2 2 | value: ‘coco’
3 | value: None
11 | MEMORY
...
hash() = uneven distribution
Optimized for fast lookups O(1)Memory inefficient (probing)
![Page 14: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/14.jpg)
Distributed Hash Tables(DHT)
![Page 15: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/15.jpg)
Split your key space into buckets
bucket h o v
bucket h o v
bucket h o v
the hash function will impact the size of each bucket
hash(key) operator
hash(key) operator
hash(key) operator
![Page 16: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/16.jpg)
Distribute your buckets to servers
hash(key) operator SERVER 0 bucket 0
hash(key) operator SERVER 1 bucket 1
hash(key) operator SERVER 2 bucket 2
what’s the best operator function to find the server hosting the bucket for my key ?
![Page 17: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/17.jpg)
md5(key) % (number of buckets) = server
Naive DHT implementation
int(md5(b'd').hexdigest(), 16) % 3 = 0 SERVER 0 bucket 0
% 3 = 1 SERVER 1 bucket 1
% 3 = 2 SERVER 2 bucket 2
simple & looking good...
int(md5(b'e').hexdigest(), 16)
int(md5(b'f').hexdigest(), 16)
![Page 18: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/18.jpg)
md5(key) % (number of buckets) = server
Naive DHT implementation
int(md5(b'd').hexdigest(), 16) % 4 = 1 (was 0) SERVER 0 bucket 0
% 4 = 2 (was 1) SERVER 1 bucket 1
% 4 = 3 (was 2) SERVER 2 bucket 2
...until you add / remove a bucket/server!
int(md5(b'e').hexdigest(), 16)
int(md5(b'f').hexdigest(), 16)
% 4 = 1
SERVER 3 bucket 3
int(md5(b'g').hexdigest(), 16) SERVER 1 bucket 1
![Page 19: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/19.jpg)
n/(n+1)~ fraction of remapped keys
![Page 20: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/20.jpg)
HELP!we need consistency
![Page 21: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/21.jpg)
The Hash Ring
![Page 22: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/22.jpg)
hash(server 1)
Place your servers on the continuum (ring)
hash(server 2)
hash(server 0)
![Page 23: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/23.jpg)
Keys’ bucket is on the next server in the ring
SERVER 1
SERVER 2
SERVER 0
hash(key)
hash(key)
![Page 24: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/24.jpg)
1/n~ fraction of remapped keys
![Page 25: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/25.jpg)
Uneven partitions lead to hotspots
server 0
server 2
server 1hash functions are not perfect
![Page 26: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/26.jpg)
Which hash function to use ?Cryptographic hash functions
● MD5● SHA1● SHA256
Non cryptographic hash functions
● CityHash (google)● Murmur (v3)
standard optimized for key lookups
adoption fast
need of C libsneed conversion to int
SHAX - MD5 - CityHash128 - Murmur3 - CityHash64 - CityHash32speed
![Page 27: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/27.jpg)
Hash Rings vnodes & weights mitigate hotspots
reduces load variance on servers
![Page 28: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/28.jpg)
My preciouuus!
![Page 29: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/29.jpg)
Consistent Hashing implementations in python
ConsistentHashing
consistent_hash
hash_ring
python-continuum
uhashring
A simple implement of consistent hashing
The algorithm is the same as libketama
Using md5 as hashing function
Using md5 as hashing function
Full featured, ketama compatible
![Page 30: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/30.jpg)
uhashring
![Page 31: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/31.jpg)
Example use case #1Database instances distribution
DB1
DB2
DB3
DB4
client A
client B
client C
client D
![Page 32: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/32.jpg)
Example use case #1Database instances distribution
![Page 33: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/33.jpg)
Example use case #1Database instances distribution
![Page 34: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/34.jpg)
Example use case #2Disk & network I/O distribution
disk1
disk2
disk3
disk4
task A
task B
task C
task D
![Page 35: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/35.jpg)
Example use case #3Log & tracing consistency
worker1
worker2
worker3
worker4
user_id A
user_id B
user_id C
user_id D
![Page 36: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/36.jpg)
Example use case #4python-memcached consolidation
cache1
cache2
cache3
cache4
‘potato’
‘coconut’
‘tomato’
‘raspberry’
![Page 37: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/37.jpg)
Live demo raffle
List of GIFsOne of the GIF is the winnerEvery participant is a node (bucket)hash(WINNER_GIF_URL) picks the winner node
![Page 38: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/38.jpg)
http://ep17.nbly.co
(silly live demo)
![Page 39: Consistent Hashing in your python applications · Python dict() implementation hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash(‘b’)](https://reader035.vdocuments.site/reader035/viewer/2022070823/5f2bb61000b6e45f4a4a25df/html5/thumbnails/39.jpg)
Thanksgithub.com/ultrabug/ep2017github.com/ultrabug/uhashring@ultrabug