Knowledge Test: Consistent Hashing Explained

2022-12-18 553 words 3 minutes

/consistent-hashing-explained/What-is-consistent-hashing.webp

Contents

This article contains the questions and solutions related to the article: Consistent hashing explained

Disclaimer: The system design questions are subjective. This article is written based on the research I have done on the topic and might differ from real-world implementations. Feel free to share your feedback and ask questions in the comments.

Consistent hashing is used in distributed systems design such as the URL shortener and Pastebin. I recommend reading the related articles to improve your system design skills.

Questions

Why partition the data?
How does a hot spot occur?
What is gossip protocol?
What are tradeoffs of the cache replication?
What is the optimal configuration of spread and the load for the high performance of a cache server?
What are the drawbacks of key range partitioning?
How to locate a node in static hash partitioning?
What are the techniques to resolve a collision in a hashmap?
What is a virtual node?
What data structure is used to store the positions of the nodes on the hash ring?
What is the asymptotic complexity to add a node in consistent hashing?
What hash functions are used in consistent hashing?
What are the benefits of consistent hashing?
What are the drawbacks of consistent hashing?
What are the consistent hashing examples?
What are some popular variants of consistent hashing?

Solutions

Data partitioning helps to parallelize task execution, and isolate data entities resulting in improved performance and scalability of the system
The large share of data and a high volume of requests to a node results in a hot spot
A peer-to-peer communication technique used by nodes to periodically exchange state information
Only a limited data set is cached and consistency between cache replicas is expensive to maintain
The optimal configuration for the high performance of a cache server is to keep the spread and the load at a minimum
The data set is not necessarily uniformly distributed among the cache servers as there might be more keys in a certain key range
Node ID = hash(key) mod N, where N is the array’s length and the key is the key of the data object
Open addressing and Chaining
The technique of assigning multiple positions to a node is known as a virtual node
The self-balancing binary search tree data structure is used to store the positions of the nodes on the hash ring
O(k/n + logn), where O(k/n) for redistribution of keys and O(logn) for binary search tree traversal
MD5, SHA-1, SHA-256 ,MurmurHash, xxHash, MetroHash, or SipHash1–3 are used in consistent hashing
Horizontally scalable and minimized data movement when the number of nodes changes
Cascading failure due to hot spots and non-uniform distribution of nodes and data
The discord server (discord space or chat room) is hosted on a set of nodes. The client of the discord chat application identifies the set of nodes that hosts a specific discord server using consistent hashing
Multi-probe consistent hashing and Consistent hashing with bounded loads

What to learn next?

Get the powerful template to approach system design for FREE on newsletter sign-up:

License

CC BY-NC-ND 4.0: This license allows reusers to copy and distribute the content in this article in any medium or format in unadapted form only, for noncommercial purposes, and only so long as attribution is given to the creator. The original article must be backlinked.