Consistent Hashing Ring
This blog represents What’s The Funda (WTF) around consistent hashing and DHT (Distributed Hash Tables), Databases use cases where it is used.
Lets understand the traditional hashing mechanism using following diagram:
Figure 1. How does Hash Table/Map works?
Pay attention to some of the following aspects as per the above diagram:
Lets see if we can apply the traditional technique such as above to store a record (having some key) to a specific computer (node) out of a list of computers (nodes).
The hashing technique, thus, can be said to be inconsistent.
The requirement is to store key-value pairs on different computer nodes as a function of hash of key value and number of computer nodes.
Let us say the hash value is calculated as following:
hash_value = hash(key);
Computer on which the data gets stored is calculated as following:
computerThatStoresRecord = func(hash_value, noOfComputerNodes);
So far so good. The above calculation can be done on a centralized node and the data/record is appropriately sent to the node and stored/written/persisted there. The same function is used for reading/retrieving the data/record from a specific node.
Problem arises when the the value of noOfComputerNodes changes. This can change due to the fact that the one can add one or more nodes or, one or more nodes can shut down.
Due to the way that record’s node location s determined, it would start getting wrong. This is because different value of computerThatStoresRecord would get calculated as a result of changed value of noOfComputerNodes. As a result, all the records may end up getting written on different nodes.
So, there is a problem. How to solve this problem? Consistent Hashing is the answer.
Consistent hashing is a special hashing technique using which only a set of key-value pairs get relocated to new buckets unlike traditional hashing technique such as that mentioned above. Following happens in case of consistent hashing:
Figure 2. Consistent hashing explained
Let’s apply consistent hashing technique to store key-value pair in the network of computer nodes.
Figure 3. Consistent Hashing Ring
Consistent hashing is used for distributing data across the database cluster. This is used to minimize data reorganization when nodes are added or removed. Following are some of the databases where consistent hashing technique is used:
Consistent hashing is used in the system based on Distributed Hash Tables, or, in other words, distributed key-value stores. For example, web caching, distributed file system etc.
We’ve all been in that meeting. The dashboard on the boardroom screen is a sea…
When building a regression model or performing regression analysis to predict a target variable, understanding…
If you've built a "Naive" RAG pipeline, you've probably hit a wall. You've indexed your…
If you're starting with large language models, you must have heard of RAG (Retrieval-Augmented Generation).…
If you've spent any time with Python, you've likely heard the term "Pythonic." It refers…
Large language models (LLMs) have fundamentally transformed our digital landscape, powering everything from chatbots and…