Proposal: better bit pointer syntax and semantics

This proposal arose from a discussion with @jacobly0 on the flaws of the current bit-pointer syntax in the context of #22915.

## Background

Zig has a thing called "bit pointers". They're very niche; many people probably don't realise that they exist. But they look like this:
```zig
*align(8:10:5) u9
```
This pointer type does not mean that there is a `u9` value at the address the pointer represents. Instead, at that address (which is 8-byte aligned) there is a 5-byte "backing integer", which, when interpreted as packed memory, contains a `u9` at bit offset 10. Bit pointers are niche, but under our current semantics for evaluating code like `foo.bar = 123`, they need to exist, because that code is effectively doing `(&foo.bar).* = 123` under the hood.

However, bit pointers have some problems. One is in the syntax; the syntax shown above is extremely opaque, and the values seem to be shoved into the `align` qualifier for no clear reason. I don't think any new user would guess that random colon-separated numbers in the "align" field completely change how the pointer works.

A more significant problem, which is what first inspired this issue, is that this pointer does not necessarily contain sufficient information to correctly lower loads and stores. The correct lowering depends on the compiler's representation of the backing type, which the Zig specification will not define for "weird" integer types like `u35`. However, the given "host size" (5 in the above example) is insufficient to reverse-engineer the host integer, because it is given in byte units. This definitely limits the allowed representations for integers, but it does so in an *extremely* subtle way.

Lastly, there is the problem of vectors. If you take a pointer to a vector element (i.e. `&vec[1]`), then that can't really be a byte-level pointer, because vector elements may be bit-packed -- again, it is the implementation's choice what the representation here is. But representing the actual bit offset in the pointer isn't appropriate, because that offset depends on the implementation, so the *type* of something like `&vec[1]` would become entirely implementation-defined -- plus, it would tie vectors to Zig's "packed memory layout" concept, which may be confusing depending on how vectors are actually represented by the implementation. To solve this case, we currently have... a *fourth* field in the `align` qualifier!

```zig
*align(2:0:5:1) u2
// The address is '2'-byte aligned...
// ...and points to a vector of *length* '5' (note the distinct meaning) of `u2`...
// ...the '0' is, uh, unused...
// ...and we're referring to the element at index 1!
```

This type is pretty ridiculous, to be honest. It would be nice to unify these concepts more effectively, under one umbrella which provides sufficient information for any representation, and also combine it with a more intuitive syntax. If only there was a proposal for a new feature that did just that...

## Proposal

Eliminate the current bit-pointer forms, and replace them with the following:
```zig
*packed(BackingType, offset) EmbeddedType
```

Before I explain this, here are some examples:
```zig
// Pointer to a u9 at bit offset 10 into a `packed struct(u35)`, which is naturally aligned (say 8 bytes).
*align(8:10:5) u9    // old
*packed(u35, 10) u9  // new

// Pointer to the element at index 1 of a `@Vector(3, u8)`, which is naturally aligned (say 2 bytes).
*align(2:0:3:1) u8             // old
*packed(@Vector(3, u8), 8) u8  // new

// Pointer to a u3 at bit offset 5 into a `packed struct(u10)`, which is aligned to an unnatural 16-byte boundary.
*align(16:5:2) u3             // old
*align(16) packed(u10, 5) u3  // new
```

Hopefully those examples gave you a bit of a feel for it, but here's the idea. The `packed(T, o)` qualifier means that the pointer's address does not actually refer to the pointer's element type at the byte level; instead, it refers to a `T`. Then, in the *bit-level representation* (in the `@bitCast` sense; see also #19755) of that `T`, the pointer element type is found starting at bit offset `o` (where 0 means LSB).

We use the keyword `packed` because this concept is highly related to that of "packed memory"; it's perhaps not exactly correct (under the accepted #19755 vectors are no longer packable types), but it's much closer than associating it with "alignment". Critically, this gives a user who hasn't encountered this niche feature before at least a vague idea of what it might be about, assuming that they know about the meaning of `packed` in Zig (which they probably do if they're encountering this concept!).

The problem of not having enough information to lower loads and stores is solved by this proposal, and we can prove that by implementing loads/stores of these pointers in userland:
```zig
//! These implementations are for demonstration purposes only; this code would not be in the standard
//! library or anything. In compilation implementation terms, it's possible that `Air.Legalize` could
//! perform a transformation akin to this.
fn load(ptr: *align(a) packed(B, o) E) E {
    const Bits = @Int(.unsigned, @bitSizeOf(B));
    const ElemBits = @Int(.unsigned, @bitSizeOf(E));
    const backing_ptr: *align(a) B = @ptrCast(ptr); // (this might not actually be allowed, just for demonstration purposes)
    const bits: Bits = @bitCast(backing_ptr.*);
    const elem_bits: ElemBits = @truncate(bits >> o)
    return @bitCast(elem_bits);
}
fn store(ptr: *align(a) packed(B, o) E, elem_val: E) void {
    const Bits = @Int(.unsigned, @bitSizeOf(B));
    const ElemBits = @Int(.unsigned, @bitSizeOf(E));
    const elem_mask: Bits = ~@as(ElemBits, 0) << o;
    const backing_ptr: *align(a) B = @ptrCast(ptr); // (this might not actually be allowed, just for demonstration purposes)
    const old_bits: Bits = @bitCast(backing_ptr.*);
    const elem_bits: ElemBits = @bitCast(elem_val);
    const new_bits: Bits = (old_bits & ~elem_mask) | (@as(Bits, elem_bits) << o)
    backing_ptr.* = @bitCast(new_bits);
}
```

And, we've unified pointers into packed structs and pointers into vectors under one roof. Lovely!

One thing I've not mentioned yet: the compiler canonicalizes the `BackingType` into either an integer type or a vector type. That's because things like packed structs all have the same layout as their backing integer, so including the actual struct type in the pointer type would be redundant information. By "the compiler canonicalizes it", I mean that it is *permitted* to write `*packed(packed struct(u32) { ... }, 4) u8`, but the compiler simplifies this type to the equivalent `*packed(u32, 4) u8`; these types will compare equal, and printing the type (`@compileLog`/`@typeName`) will print the latter.

Also, it would be possible to use `:` as the expression separator inside `packed(...)` instead of a comma:
```zig
*packed(u35:10) u9
*packed(@Vector(3, u8):8) u8
```

I *might* prefer this, because it visually distinguishes it a bit better in the vector case, but then it's kind of a random syntax. Feel free to bikeshed this in the comments. I'll assume a comma for now, since that seems like the obvious choice.

## Runtime Vector Stores

There's one capability we lose in this proposal. Right now, Zig lets you write code like this:
```zig
var runtime_idx: usize = 1; // this is runtime-known
test "assign to runtime vector index" {
    var vec: @Vector(4, u2) = @splat(0);
    vec[runtime_idx] = 1; // <---- this!
}
```

Well, given that the line in question works by taking a pointer `&vec[runtime_idx]`, how does that work? Bit pointers need a comptime-known bit offset! Well, there's yet another hacked-on extension to the bit pointer syntax: the vector index can be a special value representing "runtime-known", which the compiler represents with a question mark.
```zig
var runtime_idx: usize = 1; // this is runtime-known
test "runtime vector index pointer" {
    var vec: @Vector(4, u2) = @splat(0);
    @compileLog(@TypeOf(&vec[runtime_idx])); // @as(type, *align(1:0:4:?) u2)
}
```

In theory, you're only allowed to store to this pointer, and only in a way where the compiler can "track" the index which was used to create it; there is a compile error for loading from it, or from storing to it when the index has been "lost". In practice, I think there are some broken interactions, and the situations in which this "tracking" succeeds manage to expose subtle compiler implementation details. All in all, this feature is pretty broken.

**I am intentionally not proposing a replacement for this syntax here.**

Storing values into arbitrary *runtime-known* elements of a vector seems like a pretty niche use case, and not really something worth doing at all. In fact, it seems like a potential antipattern; ideally, all operations on a vector should be SIMD operations. Accessing a single element *based on a runtime index* should be extremely rare. So, in the rare case that this is needed, you can implement it in terms of other language features:
```zig
// old
fn vecWithChangedElem(v: @Vector(5, u32), idx: usize, elem_val: u32) @Vector(5, u32) {
    var res = v;
    res[idx] = elem_val;
    return res;
}
// new
fn vecWithChangedElem(v: @Vector(5, u32), idx: usize, elem_val: u32) @Vector(5, u32) {
    const pred: @Vector(5, bool) = @bitCast(@as(u5, 1) << @intCast(idx));
    return @select(i32, pred, @splat(elem), v);
}
```

It's true that the new one is a bit trickier to understand, but it simplifies the language, this should be an extremely rare operation anyway, and LLVM actually seems to be a little better at optimizing it, at least sometimes: https://zig.godbolt.org/z/48haGorej

I don't think there's any justifiable reason for `vec[runtime_idx]` or `&vec[runtime_idx]` to not be a compile error. Currently, the former is, while the latter is not; this proposal brings the two forms into alignment, at the expense of making a rare and often-inefficient operation perhaps require a helper function. That seems like a completely reasonable tradeoff to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposal: better bit pointer syntax and semantics #24061

Background

Proposal

Runtime Vector Stores

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Proposal: better bit pointer syntax and semantics #24061

Description

Background

Proposal

Runtime Vector Stores

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions