Explicitly redistributing isn’t required since inserting and removing hash values of servers in the sorted_keys data structure automatically takes care of it. Since sorted_keys is sorted by hash values, while adding and removing the order of other server hash values doesn't change.
The article is really incredible, very helpful to understand the consistent hashing for a beginner like me.
I was wondering If it’s possible to have an edge case like 2 different virtual nodes sharing the same hash code(collision). If so how do we handle the remove servers when 2 virtual happen to share the same hash code.
Great observation! While hash collisions are rare, they can still happen. In this implementation, the sorted_keys structure may store the same hash value for different virtual nodes. Currently, collisions aren’t explicitly handled, but they can be resolved by storing pairs {hash_value, virtual_node_id} instead of just hash values. This way, when removing a server, we can accurately identify and delete the correct virtual node, even if multiple nodes share the same hash.
Shouldn’t you be redistributing when you remove and add a node in the Python implementation?
Explicitly redistributing isn’t required since inserting and removing hash values of servers in the sorted_keys data structure automatically takes care of it. Since sorted_keys is sorted by hash values, while adding and removing the order of other server hash values doesn't change.
Got it, Thanks for answering so quickly
The article is really incredible, very helpful to understand the consistent hashing for a beginner like me.
I was wondering If it’s possible to have an edge case like 2 different virtual nodes sharing the same hash code(collision). If so how do we handle the remove servers when 2 virtual happen to share the same hash code.
Great observation! While hash collisions are rare, they can still happen. In this implementation, the sorted_keys structure may store the same hash value for different virtual nodes. Currently, collisions aren’t explicitly handled, but they can be resolved by storing pairs {hash_value, virtual_node_id} instead of just hash values. This way, when removing a server, we can accurately identify and delete the correct virtual node, even if multiple nodes share the same hash.