Any distributed system that operates at scale often relies on unique ids.
For example, consider order tracking in e-commerce: each order placed by a customer is assigned a unique ID, allowing the system to track it through every stage—order processing, payment, shipping, and delivery.
But how do we generate these IDs in a way that’s fast, unique, reliable, and scalable?
In this article we’ll dive into 7 popular approaches to generate unique ids in distributed systems.
📣 Design, develop and manage distributed software better (Sponsored)
Multiplayer auto-documents your system, from the high-level logical architecture down to the individual components, APIs, dependencies, and environments. Perfect for teams who want to speed up their workflows and consolidate their technical assets.
1. UUID (Universally Unique Identifier)
UUIDs, also known as GUIDs (Globally Unique Identifiers) are 128-bit numbers widely used for generating unique identifiers across distributed systems due to their simplicity and lack of dependency on a centralized source.
In this setup, each server can generate unique IDs independently.
UUIDs come in multiple versions:
UUID v1 (Time-Based): Uses timestamp and machine-specific information like the MAC address.
UUID v3 (Name-Based with MD5): Generated by hashing a namespace and name using MD5.
UUID v4 (Random): Uses random values for most bits, providing a high degree of uniqueness.
UUID v5 (Name-Based with SHA-1): Similar to v3 but uses SHA-1 hashing for stronger uniqueness.
The most commonly used version is UUID v4.
Format (UUID 4)
Example: 550e8400-e29b-41d4-a716-446655440000
Randomness (122 bits): Most of the UUID is composed of random hexadecimal digit
(0–9 or a–f)
.Version (4 bits): The third block’s first character is always
4
, identifying it as a version 4 UUID.Variant (2-3 bits): Located in the fourth block, it’s either
8
,9
,a
, orb
. It represents the variant and ensures that UUID follows the RFC 4122 standard.
Code Example (Python)
import uuid
# Generate a random UUID (version 4)
uuid_v4 = uuid.uuid4()
print(f"Generated UUID v4: {uuid_v4}")
Pros:
Decentralized: UUIDs can be generated independently across servers.
Collision Resistance: With 128 bits, UUID v4 has a collision probability so low it’s practically negligible.
To visualize: Even if every person on Earth generated 1 million UUIDs per second, it would take over 100 years to have a 50% chance of a single collision.
Ease of Implementation: Most programming languages provide built-in libraries for generating UUIDs.
Cons:
Large Size: UUIDs consume 128 bits, which can be excessive for some storage-sensitive systems.
Not Sequential: UUIDs lack order, meaning they don’t play well with indexing systems like B-Trees.
UUIDs are ideal when you need globally unique IDs across distributed systems without central coordination and when order isn’t important (e.g., Order IDs in E-commerce, Session IDs for User Authentication).
2. Database Auto-Increment
Database auto-increment is a feature in relational databases that automatically generates unique, sequential numeric IDs whenever a new record is inserted into a table.
Typically, the numbering starts from an initial value (often 1) and increments by a fixed amount (commonly 1) for each new row.
Example in SQL:
Here, the user_id
column will start from 1 and automatically increment for each new row, generating unique values.
This approach works well for small applications with just one database node.
However, in distributed environments, depending on a single database node for ID generation can quickly become a bottleneck and a single point of failure.
To make auto-increment work in distributed systems, here are two effective strategies:
2.1 Range-Based ID Allocation
In this approach, each database node is assigned a unique range of IDs, allowing them to generate IDs independently and avoid conflicts or overlaps with other nodes.
For example, in a three-node setup:
Node 1 can use IDs from 1 to 100000.
Node 2 can use IDs from 100001 to 200000.
Node 3 can use IDs from 200001 to 300000.
Limitations of Range-Based Allocation:
Range Exhaustion: High-traffic nodes may exhaust their assigned range quickly, requiring reallocation or range expansion.
Complex Management: As nodes are added or removed, reassigning and managing ranges can become complex.
Waste of ID Space: Uneven traffic across nodes may leave some ranges underutilized.
2.2 Step-Based Auto-Increment
In step-based auto-increment, each node generates IDs with a predefined step size.
For example, if the step size is 3:
Node 1 generates IDs as 1, 4, 7, 10, ….
Node 2 generates IDs as 2, 5, 8, 11, ….
Node 3 generates IDs as 3, 6, 9, 12, ….
This approach ensures each node generates unique IDs independently, but adding or removing nodes may require reconfiguring the step size.
Pros of Database Auto-Increment
Simplicity: Straightforward to set up, as most relational databases support it natively.
Sequential Order: IDs are generated in a strictly increasing order, making it easier to sort records by insertion order.
Low Storage Overhead: IDs are typically small integers, making them efficient for storage and indexing.
Cons of Database Auto-Increment:
Coordination Overhead: In a distributed setup, managing ranges or step increments requires careful setup and ongoing monitoring to avoid collisions.
Predictable IDs: Sequential IDs can be predictable, which may pose security risks in some applications (e.g., an attacker could guess the next ID).
ID Exhaustion: High insertion rates can exhaust the integer range, especially with smaller data types.
Database Auto-Increment is useful when you need simple, sequential IDs (e.g., relational database primary keys).
3. Snowflake ID (Twitter's Approach)
The Snowflake ID generation system, developed by Twitter, is a method for generating 64-bit IDs that are:
Time-based
Ordered
Distributed-system friendly
It was created to handle the need for high-throughput, time-ordered IDs that can scale horizontally across multiple data centers and machines.
These IDs are not only unique but also sequential within each machine, making them highly efficient for indexing and ordering operations.
Format
Example Snowflake ID (binary):
0000011010011001110010110010101001010101100010101110000000000001
Breakdown:
1. Sign Bit (1 bit): Always set to 0
to ensure the ID is positive.
2. Timestamp (41 bits): The first 41 bits encode the timestamp in milliseconds since the Snowflake epoch (often set to November 4, 2010). This timestamp allows the IDs to be sorted chronologically.
3. Datacenter ID (5 bits): The next 5 bits represent the data center or region ID, which allows for up to 32 (2^5) unique data centers.
4. Machine ID (5 bits): The following 5 bits represent the machine (or worker) ID within the data center, allowing for 32 machines per data center.
5. Sequence Number (12 bits): The last 12 bits are a sequence counter, which resets every millisecond. This counter allows each machine to generate up to 4,096 (2^12) unique IDs per millisecond.
Pros
Time-Ordered: Snowflake IDs include a timestamp, making them naturally ordered by generation time. This is beneficial for indexing and time-series data.
Decentralized: Each machine can generate unique IDs independently, without requiring a central coordination server.
High Throughput: With 12 bits for the sequence, each machine can generate up to 4,096 unique IDs per millisecond, making Snowflake IDs suitable for high-traffic environments.
Compact and Efficient: At 64 bits, Snowflake IDs are more storage-efficient than UUIDs (128 bits).
Cons
Clock Synchronization: Snowflake ID generation depends on synchronized clocks. If the system clock moves backward, it can lead to duplicate IDs or ID generation errors.
Limited Capacity: Each machine can only generate up to 4,096 IDs per millisecond. If a higher rate is required, additional machines or other scaling solutions are needed.
Snowflake IDs are ideal when you need unique, time-ordered IDs in distributed systems that require high throughput and scalability (e.g., social media posts, event logs).
4. Redis-Based ID Generation
Redis, an in-memory key-value store, can also be used for ID generation due to its atomic operations and low-latency performance.
Here’s how Redis-based ID generation works:
Initialize a Key: Setup a Redis key to store the current ID value.
Increment on Demand: Whenever a new ID is needed, an application node increments the counter using Redis’s atomic
INCR
orINCRBY
command.Return Unique ID: The incremented counter value is guaranteed to be unique and it’s returned to the application.
Redis guarantees atomicity, so no two calls to generate_id()
will ever receive the same ID, even if multiple nodes are concurrently accessing the Redis server.
Pros
Atomicity: Redis’s
INCR
andINCRBY
commands are atomic, ensuring each generated ID is unique and sequential without any risk of collision.High Throughput: As an in-memory database, Redis provides very low latency, making it ideal for high-speed ID generation.
Simplicity: Setting up Redis for ID generation is straightforward and requires minimal configuration.
Sequential IDs: Redis-generated IDs are sequential, making them suitable for ordered indexing in databases or applications where chronological order is important.
Cons
Single Point of Failure: Using a single Redis instance as the ID generator can become a bottleneck and a potential single point of failure.
Scalability Limitations: While Redis can handle high throughput, using it as a centralized ID generator limits horizontal scaling because every request depends on a single Redis instance.
Redis-based ID generation is useful when you need high-speed, centralized ID generation with sequential order, and the setup is primarily single-node.
5. Nano ID
NanoID is a small, URL-friendly, unique string ID generator designed for simplicity, flexibility, and performance in distributed systems.
Created as a modern alternative to UUID, Nano ID has gained popularity in frontend applications and modern web development.
Unlike UUIDs, Nano ID doesn’t follow a rigid structure, making it highly adaptable to different applications.
Format
By default, NanoID generates a 21-character ID using a URL-safe base64 alphabet. Each character is randomly chosen from a 64-character set (A-Z, a-z, 0-9, "-", "_")
, creating a 128-bit identifier.
However, you can customize the length and character set to meet your application’s specific requirements.
Example: 7QLiKDgL-WG4E8z6xyVc0
Here’s how to generate custom nano ids in Python:
import nanoid
# Generate default ID
id = nanoid.generate() # "V1StGXR8_Z5jdHi6B-myT"
# Custom length
custom_id = nanoid.generate(size=10) # "IRFa-VaY2b"
# Custom alphabet
custom_generator = nanoid.generate(
alphabet='1234567890',
size=6
) # "123456"
Pros
Compact and Readable: With a default length of 21 characters, NanoID is shorter and more readable than UUIDs, which are 36 characters.
URL-Friendly: NanoID uses a URL-safe character set by default, making it ideal for use in URLs without needing additional encoding.
Decentralized: Each node can generate unique IDs independently with minimal risk of collision.
Customizable: You can adjust the length and character set to suit specific needs.
High Performance: Nano ID’s generation is fast, making it ideal for scenarios requiring rapid creation of many unique IDs.
Cons
Non-Sequential: NanoIDs are purely random and lack sequential ordering, which can lead to fragmentation in database indexes.
Collision Probability: Reducing the length of Nano ID increases the risk of collision, so longer IDs may be needed for critical applications.
Nano ID is ideal for generating short, customizable, URL-friendly IDs in applications that don’t require time-ordering, such as URLs, tokens, and frontend identifiers.
6. Hash-Based IDs
Hash-Based IDs are unique identifiers generated by applying cryptographic hash functions to specific data inputs.
They're deterministic, meaning the same input always produces the same ID, making them ideal for systems that need consistent identifiers, like deduplication and caching.
Format
The format of hash-based IDs depends on the hashing algorithm used, such as MD5, SHA-1, or SHA-256.
These IDs are typically encoded as hexadecimal strings and can vary in length depending on the hash function:
MD5: 128 bits (32 hexadecimal characters)
SHA-1: 160 bits (40 hexadecimal characters)
SHA-256: 256 bits (64 hexadecimal characters)
The choice of hashing algorithm depends on the application's requirements for uniqueness, security, and collision resistance.
Example URL: https://example.com/some-page
SHA-256 hash output: 66e9c37ef3c04d3df238cd7d6b6b524f06c6e6dc9892e13c46f6d59f212dad0e
Code Example:
Pros
Deterministic: The same input will always generate the same ID.
Collision-Resistant: Strong hash algorithms like SHA-256 provide high collision resistance, making it extremely unlikely for two different inputs to produce the same hash.
Decentralized: IDs can be generated independently across nodes in a distributed system without needing central coordination.
Cons
Non-Sequential: Hash-based IDs are non-sequential, which can lead to fragmentation in database indexes, slowing down query performance.
Length: Some hash functions, like SHA-256, produce long IDs (64 characters) that may be inefficient for certain applications.
Collision Probability: Using weaker hashes (e.g., MD5) increases the risk of collisions, which can cause issues in systems that require strict uniqueness.
No Metadata: Hash-based IDs are pure hashes and don’t contain metadata information like timestamps or machine identifiers.
Hash-Based IDs are useful when you need deterministic, unique IDs based on input data, like content or URLs, rather than random values (e.g., deduplication, URL shorteners, caching systems).
7. ULID (Universally Unique Lexicographically Sortable Identifier)
A ULID is a 26-character, URL-safe string that combines:
Timestamp (first 10 characters)
Randomness (last 16 characters)
This format produces unique, readable and lexicographically sortable IDs.
Unlike UUIDs, which lack natural ordering, ULIDs embed a timestamp component and use a compact, URL-friendly base32 encoding.
Format
01AN4Z07BY 79KA1307SR9X4MV3
|----------| |----------------|
Timestamp Randomness
10 chars 16 chars
(48 bits) (80 bits)
Timestamp (48 bits): The first 10 characters represent the timestamp in milliseconds since the Unix epoch (January 1, 1970). This allows ULIDs to be naturally sorted by creation time.
Randomness (80 bits): The remaining 16 characters are random, ensuring uniqueness even when multiple ULIDs are generated within the same millisecond.
There are libraries available for generating ULIDs in many programming languages, including JavaScript, Python, Java, and Go.
Pros
Lexicographically Sortable: ULIDs are time-ordered and naturally sortable, making them suitable for time-series data.
Compact and URL-Friendly: With 26 characters in base32 format, ULIDs are shorter than UUIDs, making them suitable for embedding in URLs.
Decentralized: ULIDs can be generated independently on multiple nodes, as each ID is based on a timestamp and a random component, reducing the need for centralized coordination.
Readable and Error-Resistant: The base32 encoding used in ULIDs is designed to avoid confusing characters, such as "I" and "O".
Cons
Limited Time Precision: ULIDs use milliseconds for the timestamp, which may not be precise enough for high-frequency systems that need IDs at microsecond or nanosecond levels.
Limited Popularity: Although gaining popularity, ULIDs are still less widely supported than UUIDs.
No Embedded Metadata: ULIDs encode only a timestamp; they do not include other metadata, such as machine ID or data center information, like Snowflake IDs.
ULIDs are a great choice when you need unique, time-ordered, URL-friendly IDs that can be generated independently without central coordination (e.g., time-series data, event logs).
Thank you so much for reading.
If you found it valuable, hit a like ❤️ and consider subscribing for more such content every week.
If you have any questions or suggestions, leave a comment.
Checkout my Youtube channel for more in-depth content.
Follow me on LinkedIn and X to stay updated.
Checkout my GitHub repositories for free interview preparation resources.
I hope you have a lovely day!
See you soon,
Ashish
https://github.com/spa5k/uids-postgres
UUIDv1 with time rather than node id. Better yet, UUIDv6 or v7.