Using random strings (e.g., generated by UUID.randomUUID().toString()) as primary keys is generally a bad idea for performance reasons:
Slower Comparisons: String comparisons are inherently slower than numeric comparisons. Databases can compare integers much more efficiently at the hardware level.
First of all, you should not store UUIDs as strings. As fs111 mentionned, UUIDs are 16 bytes, so they fit into a uint128_t. Postgres supports UUIDs natively, but even if your database is not supporting 128bit integers, if you encode it in CHAR(16), databases will be quite fast at comparing them.
Also, snowflake IDs are basically superseded by UUID7. You get a bigger space than snowflake IDs, and all the goodies of snowflake IDs and UUID4.
UUIDv7 requires generating a reasonably good quality random number every time you generate an ID, while Snowflake only needs to increment a counter, so it doesn’t seem like a straight upgrade. One could imagine a 128-bit snowflake-like format that’s like, 48-bit timestamp, 64-bit worker identifier (can be randomly generated) and a 16-bit counter
Fair. But I would argue that for most applications, the probability of generating two random sequence of 48 bits on the same timestamp is quite low.
Also, nothing guarantees that two sequence numbers won’t conflict on the same timestamp, what prevents conflict is the worker identifier, which you want to randomly generate in your proposal. How is that different from generating the entire remaining 74 bits? (like UUIDv7)
But you’re right, UUIDv7 is not guaranteeing full uniqueness, but I would just disregard the probability as “too low” like for UUIDv4.
First of all, you should not store UUIDs as strings. As fs111 mentionned, UUIDs are 16 bytes, so they fit into a
uint128_t. Postgres supports UUIDs natively, but even if your database is not supporting 128bit integers, if you encode it inCHAR(16), databases will be quite fast at comparing them.Also, snowflake IDs are basically superseded by UUID7. You get a bigger space than snowflake IDs, and all the goodies of snowflake IDs and UUID4.
UUIDv7 requires generating a reasonably good quality random number every time you generate an ID, while Snowflake only needs to increment a counter, so it doesn’t seem like a straight upgrade. One could imagine a 128-bit snowflake-like format that’s like, 48-bit timestamp, 64-bit worker identifier (can be randomly generated) and a 16-bit counter
Fair. But I would argue that for most applications, the probability of generating two random sequence of 48 bits on the same timestamp is quite low.
Also, nothing guarantees that two sequence numbers won’t conflict on the same timestamp, what prevents conflict is the worker identifier, which you want to randomly generate in your proposal. How is that different from generating the entire remaining 74 bits? (like UUIDv7)
But you’re right, UUIDv7 is not guaranteeing full uniqueness, but I would just disregard the probability as “too low” like for UUIDv4.
The RFC explains how to generate UUIDv7 using a counter, so you only occasionally need to get fresh randomness.
UUIDs are 128bit numbers and not Strings
Am really sorry that i have missed this will recheck and update
Thank you very much for informing me.
Did you generate most of the text with an LLM and edit it?