Description
Currently, the uuid.UUID
class features a fields
argument and property which is a six-tuple of integers:
- 32-bit
time_low
, - 16-bit
time_mid
, - 16-bit
time_hi_version
, - 8-bit
clock_seq_hi_variant
, - 8-bit
clock_seq_low
, - and 48-bit
node
.
Currently, those fields are only relevant when the UUID version is 1 since UUIDv3 and UUIDv5 are based on MD5 and SHA-1 respectively instead. However, the recent RFC 9562, superseeding RFC 4122, introduces one time-based UUID, namely UUIDv6 (basde on UNIX epoch instead of Gregorian epoch, and with timestamp bits ordered differently), as well as UUIDv7 and UUIDv8 that are implementation details.
Here is what we can do for now:
- For version 7, we can have:
- a cryptographically secure 74-bit chunk split into a 12 and 62-bit chunks, or
- a monotonous UUID with 12-bit sub-milliseconds precision chunk.
- For version 8, we can have:
- a time-based UUID with 10-ns precision with a 60-bit timestamp and 62 bits of random data, or
- a name-based UUID which uses secure hashing algorithms such as SHA256/SHA-3/SHAKE-256 (see here for an example of SHA256-based UUIDv8), or
- a non-cryptographically 122-bit chunk split into chunks of 48, 12 and 62-bit independent chunks. Those chunks can also be supplied by the user if they want cryptographically secure values (although I would suggest generating a UUIDv4 and change the version and variant bits manually).
With the addition of those variants, we at least have one UUID distinct from UUIDv1 featuring time-related fields. In particular, it is important to decide whether fields[0]
is the first RFC field in the UUID or if this is always the first 32-bit fields. I personally think that we should say that fields represents the RFC fields, namely, fields[0]
is a 32-bit integer corresponding to the 32 LSB (resp. MSB) of the 60-bit timestamp for UUIDv1 (resp. UUIDv6).
For UUIDv7, if we choose sub-ms precision, then the fields are a bit different in the sense that we now have unix_ts_ms (48) | ver (4) | subsec_a (12) | var (2) | counter (62)
, so we should decide how to split those fields into 6 and whether it make sense to have the corresponding properties. A similar question arises for UUIDv8.
While we could change the semantics of the fields and time_*
properties, this could break applications that assume that fields
are independent of the time (my understanding of fields is that it is independent of the time or the RFC and is only a way to partition the UUID's integral value into 32+16+16+8+8+48 bits, but such partitioning is only relevant for UUIDv1).
Therefore, I really don't know how to deal with those time-based properties. I'd like to avoid breaking longstanding applications but at the same time I don't want a property to incorrectly reflect its value. If we don't change anything, uuidv6.time_low
would actually return the 32 highest bits...
EDIT: Should this actually be a PEP? because UUIDv7 and UUIDv8 are implementation-detail so maybe a PEP might be a good idea?