Skip to content

Fix alignment for values larger than 32 bit on xtensa targets #195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

haileys
Copy link

@haileys haileys commented Sep 22, 2023

32 bit is the maximum alignment required for any object on xtensa, indeed this is the maximum alignment that esp-idf's heap allocator guarantees.

@MabezDev
Copy link
Member

Thanks for the PR!

I don't think this is correct, the Xtensa ISA (here, pg 687) dictates that 64bit ints and 64bit floats have 8-byte alignment. For u128, because this isn't a common type in C compilers we defaulted to clang and gcc's current behaviour.

The heap allocator guarantees 32bit word-aligned allocations mostly due to the fact that some address spaces do not offer byte-level access, and it's often required for certain hardware operations, like using the DMA engine.

Could you further explain your reasoning behind the PR?

@workingjubilee
Copy link

workingjubilee commented Sep 22, 2023

Er. Perhaps I am just being a simpleton here, but how can anyone allocate 64-bit objects if the heap allocator cannot guarantee 64-bit-aligned allocations? I realize Box<i64> is a rare sight, but it doesn't seem like it should be invalid.

@MabezDev
Copy link
Member

MabezDev commented Sep 22, 2023

Er. Perhaps I am just being a simpleton here, but how can anyone allocate 64-bit objects if the heap allocator cannot guarantee 64-bit-aligned allocations? I realize Box is a rare sight, but it doesn't seem like it should be invalid.

The heap allocator in question is the esp-idf one (I believe @haileys is using the std sport, please correct me if I'm wrong!), which definitely supports allocating larger than 32bit structures, but it defaults to an alignment of 32bit for various reasons, some I mentioned above.

I think there are two distinctly different topics here, although closely linked. This PR changes the calling convention for Xtensa by changing the required alignment of certain data types, which will in turn, change how these data types are stored in registers at compile time, and how they are stored on the stack. A by-product of this change is that these new primitive types will have different alignments, which will also affect how they should be allocated on the heap which I believe is what @haileys is trying to do? I'd like to know why they wish to change the alignment, and what their end goal is :)

@haileys
Copy link
Author

haileys commented Sep 22, 2023

Thanks for the fast reply @MabezDev!

I don't think this is correct, the Xtensa ISA (here, pg 687) dictates that 64bit ints and 64bit floats have 8-byte alignment. For u128, because this isn't a common type in C compilers we defaulted to clang and gcc's current behaviour.

I wasn't aware of this, thanks for pointing it out!

I think then there might be a mismatch between what the ISA specifies and what the ESP platform specifies. From my reading of the Espessif docs, it seems like (except for special cases where the hardware calls for much larger alignments), 32 bit is the maximum alignment ever mentioned.

According to page 26 ESP32 technical reference manual:

The CPU can access data bus addresses via aligned or non-aligned
byte, half-word and word read-and-write operations. The CPU can read and write data through the instruction
bus, but only in a word aligned manner; non-word-aligned access will cause a CPU exception.

However, if the ISA spec says one thing, I do agree that we can't just change things here and we need to look for a different solution.

Perhaps a little more about my use case would be helpful in determining what to do here:

I'm using the bytemuck crate to cast a 32-bit aligned byte buffer to a struct. The struct is repr(C) and contains u64 fields. Bytemuck panics when calling from_bytes unless the buffer is 64-bit aligned.

My understanding is that on an esp32, there is no safety issue or performance penalty associated with reading a 64 bit value from a 32 bit aligned address, no matter which type of RAM that data is located in. Additionally, I don't control the allocation of this buffer. The buffer my code receives is guaranteed to be 32 bit aligned, but ensuring 64 bit alignment would require me to reallocate + copy the buffer before reading, which is undesirable.

Bytemuck should panic if this data is not 32-bit aligned, since it can't know if the pointer it's casting is a pointer to IRAM which only allows 32 bit access, but panicking because stricter alignment requirements are not satisfied is not necessary I believe. The panicking code in bytemuck uses align_of, which is what led me here.

I see that LLVM allows both ABI alignment as well as a "preferred" alignment to be set. I'm not familiar with the semantics of that, but could it possibly be helpful in this situation?

If not then I'm left scratching my head, because as you point out the ABI/calling convention is quite clear about alignment requirements, but those requirements are stricter than the alignment requirements for data access. I'd love to find some kind of solution, but I'm also operating at the very edge of my knowledge here so any tips are appreciated also :)

@MabezDev
Copy link
Member

My understanding is that on an esp32, there is no safety issue or performance penalty associated with reading a 64 bit value from a 32 bit aligned address, no matter which type of RAM that data is located in. Additionally, I don't control the allocation of this buffer. The buffer my code receives is guaranteed to be 32 bit aligned, but ensuring 64 bit alignment would require me to reallocate + copy the buffer before reading, which is undesirable.

On some architectures, this would be possible, but Xtensa (generally, there is an unaligned access extension but no esp32's have it) is a strict alignment architecture. The reason your assumption above wouldn't work would be cases where your struct could fit into the available registers when calling a function. In this case, the u64 would be in registers, not on the stack, and the Xtensa arch defines how u64s are to be stored in registers (it needs to start in an even-numbered register) which is different to just storing a u32< in a single register. This is why we can't change the ABI to solve this issue.

All of the above is a bit orthogonal to the actual issue though. Why can't you ask whatever is allocating to provide memory with a given alignment? Or if you really can't control it, ask for more than needed and align yourself, so that we prevent the re-alloc, at the cost of a bit more memory usage.

@ivmarkov ivmarkov mentioned this pull request Jan 21, 2024
@ivmarkov
Copy link

@MabezDev @haileys As confirmed by @igrr and as evidenced by esp-rs/esp-idf-sys#278 , the Espressif xtensa GCC toolchain is operating with 4 byte alignment for both 64 bit types, as well as 128 bit types. For me this means:

  • The xtensa ISA variation used in the Espressif chips is probably deviating from the original xtensa ISA spec in that it "tolerates" these types being only 4 byte aligned
  • We should do the same, or else any interop with C code compiled with GCC is at risk

@ivmarkov
Copy link

The only difference between this and my other PR is that this one treats the f64 type to the 4 bytes' alignment (as opposed to its standard 8 bytes alignment) as well.

... which I think we should actually do as well. So I'll close my PR in favor of this one.

@igrr
Copy link

igrr commented Jan 22, 2024

The xtensa ISA variation used in the Espressif chips is probably deviating from the original xtensa ISA spec in that it "tolerates" these types being only 4 byte aligned

I don't think this is the case. Since this is a 32-bit architecture, there are no CPU instructions which deal with larger-than-32-bit values. The CPU simply doesn't care about alignment of 64-bit ints or floats because it never has to deal with them directly — only with one of their 32-bit parts at a time.

I am sorry for setting you in the wrong direction with my comment yesterday @ivmarkov. I have looked at GCC config again and now I see that the 8-byte data types are indeed naturally aligned. This is confirmed by checking the alignment of that structure in GCC (compiler explorer).

So this bug is on esp32-camera side, and I suspect that the similar issue may occur many other places. I don't think we have any kind of a warning or a runtime check in place to check for mismatch between the required structure alignment and the actual pointer value. (Edit: except for UBSAN, which should catch this, but not many people enable it since it increases code size a lot.) As long as the pointer is 32-bit aligned, accessing a larger-than-4-bytes value by this pointer will work, since in practice every operations on such value will be represented by multiple 32-bit operations.

me-no-dev added a commit to espressif/esp32-camera that referenced this pull request Jan 22, 2024
me-no-dev added a commit to espressif/esp32-camera that referenced this pull request Jan 22, 2024
* Align the frame buffers to the structure alignment 

cc: esp-rs/esp-idf-sys#278
cc: esp-rs/rust#195

* Include stdalign.h
@me-no-dev
Copy link

@ivmarkov a fix has been added to esp32-camera here. Can you please confirm that it fixes the issue?

@ivmarkov
Copy link

The xtensa ISA variation used in the Espressif chips is probably deviating from the original xtensa ISA spec in that it "tolerates" these types being only 4 byte aligned

I don't think this is the case. Since this is a 32-bit architecture, there are no CPU instructions which deal with larger-than-32-bit values. The CPU simply doesn't care about alignment of 64-bit ints or floats because it never has to deal with them directly — only with one of their 32-bit parts at a time.

I am sorry for setting you in the wrong direction with my comment yesterday @ivmarkov. I have looked at GCC config again and now I see that the 8-byte data types are indeed naturally aligned. This is confirmed by checking the alignment of that structure in GCC (compiler explorer).

@igrr

Well, for STD and rustc specifically either way is fine, because it uses ESP IDF's malloc for stuff that needs <= 4 bytes alignment, or else it goes to ESP IDF's memalign. And this is all nicely hidden and abstracted and user should not think or care about that.

But I'm a bit worried that the ESP IDF code might be sprinkled with mallocs left and right and if some of the allocated structures happen to contain i64 or i128 that now means a problem with rustc on non-release mode as it asserts on incorrect alignment, at least since some time.

Isn't it therefore easier and just more correct to fix both the xtensa targets in rustc, as well as the GCC ones to NOT align on more than 4 bytes? After all - and as you say - there are NO i64 and i128 types from the point of view of the CPU anyway, so why - in the first place - we are aligning these to bigger alignments then?

So this bug is on esp32-camera side, and I suspect that the similar issue may occur many other places. I don't think we have any kind of a warning or a runtime check in place to check for mismatch between the required structure alignment and the actual pointer value. (Edit: except for UBSAN, which should catch this, but not many people enable it since it increases code size a lot.) As long as the pointer is 32-bit aligned, accessing a larger-than-4-bytes value by this pointer will work, since in practice every operations on such value will be represented by multiple 32-bit operations.

Ditto.

@ivmarkov
Copy link

@ivmarkov a fix has been added to esp32-camera here. Can you please confirm that it fixes the issue?

That's probably for @ViktorWb to confirm.

@igrr
Copy link

igrr commented Jan 22, 2024

Isn't it therefore easier and just more correct to fix both the xtensa targets in rustc, as well as the GCC ones to NOT align on more than 4 bytes?

I can answer about GCC, at least: we have been trying to reduce the number of ESP-specific patches we keep on top of the upstream Xtensa target, so we'll probably not follow this route.

But I'm a bit worried that the ESP IDF code might be sprinkled with mallocs left and right and if some of the allocated structures happen to contain i64

That is a valid concern! As far as I can tell, running tests with UBSAN enabled should help us find these cases. As we already have the minimal UBSAN runtime in IDF, we'll try to enable these checks in CI and will then gradually fix all these violations.

@MabezDev
Copy link
Member

I don't think we want to deviate from the Xtensa ISA here. We spent a lot of time getting it right in both Xtensa Clang and Rust, and we have a small test suite to ensure it is correct: https://github.com/MabezDev/esp-rustc-abi-bug. We haven't had any ABI-related issues since, and seeing the report come in over the weekend worried me a little 😅. I see that the real issue was inside esp-idf in this case. I understand your concern @ivmarkov about finding other possible breakages like this in the future, but I think @igrr's suggestion of running UBSAN in esp-idf's CI is probably the more appropriate response.

With this in mind, I think I will close this PR. @haileys @ivmarkov please file any heap-allocated alignment issues you find along your travels against esp-idf please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants