Skip to content

transpile: Support unaligned reads and writes #1257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

folkertdev
Copy link
Contributor

This is very work-in-progress, but I think I need some pointers here. I'll add some inline comments.

Comment on lines +1645 to +1666
encode_entry(UO, TagUnaryOperator, childIds, [this, UO](CborEncoder *array) {
cbor_encode_string(array, UO->getOpcodeStr(UO->getOpcode()).str());
cbor_encode_boolean(array, UO->isPrefix());

if (UO->getOpcode() == UO_Deref) {
QualType eltTy = UO->getSubExpr()->getType()->getPointeeType();
CharUnits align = Context->getTypeAlignInChars(eltTy);

if (auto *TD = eltTy->getAs<TypedefType>()) {
QualType naturalTy = TD->getDecl()->getUnderlyingType();
CharUnits naturalAlign = Context->getPreferredTypeAlignInChars(naturalTy);

if (align < naturalAlign) {
cbor_encode_int(array, align.getQuantity());
} else {
cbor_encode_int(array, -1);
}

}
}
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have LLM'd my way to a solution here, this is probably terrible c++ code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with libclang either. Looks like it might work? I can look into it more later.

Comment on lines +921 to +935
} else if unaligned {
// We should use read_unaligned here:
// mk().method_call_expr(val, "read_unaligned", vec![]);
// but that interferes with `write_unaligned`

let mut val =
mk().unary_expr(UnOp::Deref(Default::default()), val);

// If the type on the other side of the pointer we are dereferencing is volatile and
// this whole expression is not an LValue, we should make this a volatile read
if lrvalue.is_rvalue() && cqual_type.qualifiers.is_volatile
{
val = self.volatile_read(val, cqual_type)?
}
Ok(val)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the value of lrvalue. I'd expect ptr to be an lvalue in e.g. *ptr = value, and an rvalue in the case return *ptr, but both are apparently lvalues. Hence, I don't see how to differentiate between reads and writes here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding to my confusion, this will not actually emit a read_volatile. Maybe I misunderstand what the attribute does here but I think that's a bug?

uint32_t volatile_read(volatile void* ptr) {
    return *((volatile unaligned_uint32*)ptr);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah this just seems wrong https://c.godbolt.org/z/sPcMM4b1T. The lrvalue is always LValue (I think a pointer dereference is just always an lvalue currently?), so that volatile read is never emitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a bug, right? I can open an issue for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the volatile code here just for trying to figure out how to emit read_aligned correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nevermind, it's already existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in return *ptr, both ptr and *ptr are lvalues in C. See https://godbolt.org/z/bxrs7dczs. return *ptr is the rvalue. So the rvalue part of it comes not from the dereference, but from what's done with the dereference, if that makes sense. Like in *p + 1, the whole *p + 1 expression is an rvalue, but the *p itself is still an lvalue because *p is writable as well, i.e., *p = 2 is well-formed while *p + 1 = 2 is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that is what makes this code confusing: I don't think it can ever be an rvalue in a way that is relevant, so that branch is unreachable.

Also in general that means we don't have great way of distinguishing whether a * means a read, write, or both. That is fine when a C * is translated with a rust *, but when you want to use the methods, it's a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there are already a bunch of open issues wrt volatile reads and writes. They just don't actually currently work as far as I can tell.

Comment on lines +19 to +21
pub unsafe extern "C" fn unaligned_read(mut ptr: *const std::ffi::c_void) -> uint32_t {
return *(ptr as *const unaligned_uint32);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is still wrong. I can make this one work, but then the unaligned write below is no longer correct.

@kkysen kkysen self-requested a review June 30, 2025 16:59
Copy link
Contributor

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I took a very long time to review this.

Comment on lines +1657 to +1661
if (align < naturalAlign) {
cbor_encode_int(array, align.getQuantity());
} else {
cbor_encode_int(array, -1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overalignment can be useful to handle as well, so maybe just cbor_encode_uint both the actual and natural alignment?

Comment on lines +1645 to +1666
encode_entry(UO, TagUnaryOperator, childIds, [this, UO](CborEncoder *array) {
cbor_encode_string(array, UO->getOpcodeStr(UO->getOpcode()).str());
cbor_encode_boolean(array, UO->isPrefix());

if (UO->getOpcode() == UO_Deref) {
QualType eltTy = UO->getSubExpr()->getType()->getPointeeType();
CharUnits align = Context->getTypeAlignInChars(eltTy);

if (auto *TD = eltTy->getAs<TypedefType>()) {
QualType naturalTy = TD->getDecl()->getUnderlyingType();
CharUnits naturalAlign = Context->getPreferredTypeAlignInChars(naturalTy);

if (align < naturalAlign) {
cbor_encode_int(array, align.getQuantity());
} else {
cbor_encode_int(array, -1);
}

}
}
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with libclang either. Looks like it might work? I can look into it more later.

Comment on lines +1233 to +1246
AddressOf, // &x
Deref { unaligned: bool }, // *x
Plus, // +x
PostIncrement, // x++
PreIncrement, // ++x
Negate, // -x
PostDecrement, // x--
PreDecrement, // --x
Complement, // ~x
Not, // !x
Real, // [GNU C] __real x
Imag, // [GNU C] __imag x
Extension, // [GNU C] __extension__ x
Coawait, // [C++ Coroutines] co_await x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting here is annoying 😩. Could you add another commit that just makes all of the same-line // comments into proper doc comments above the variant with a newline after each one? Or I could if you prefer.

@@ -822,7 +837,7 @@ impl<'c> Translation<'c> {

match arg_kind {
// C99 6.5.3.2 para 4
CExprKind::Unary(_, c_ast::UnOp::Deref, target, _) => {
CExprKind::Unary(_, c_ast::UnOp::Deref { unaligned: _ }, target, _) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CExprKind::Unary(_, c_ast::UnOp::Deref { unaligned: _ }, target, _) => {
CExprKind::Unary(_, c_ast::UnOp::Deref { .. }, target, _) => {

If we don't care about the alignment, could it just be this? Like you did for the other match.

Comment on lines +921 to +935
} else if unaligned {
// We should use read_unaligned here:
// mk().method_call_expr(val, "read_unaligned", vec![]);
// but that interferes with `write_unaligned`

let mut val =
mk().unary_expr(UnOp::Deref(Default::default()), val);

// If the type on the other side of the pointer we are dereferencing is volatile and
// this whole expression is not an LValue, we should make this a volatile read
if lrvalue.is_rvalue() && cqual_type.qualifiers.is_volatile
{
val = self.volatile_read(val, cqual_type)?
}
Ok(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a bug, right? I can open an issue for it.

Comment on lines +921 to +935
} else if unaligned {
// We should use read_unaligned here:
// mk().method_call_expr(val, "read_unaligned", vec![]);
// but that interferes with `write_unaligned`

let mut val =
mk().unary_expr(UnOp::Deref(Default::default()), val);

// If the type on the other side of the pointer we are dereferencing is volatile and
// this whole expression is not an LValue, we should make this a volatile read
if lrvalue.is_rvalue() && cqual_type.qualifiers.is_volatile
{
val = self.volatile_read(val, cqual_type)?
}
Ok(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the volatile code here just for trying to figure out how to emit read_aligned correctly?

Comment on lines +921 to +935
} else if unaligned {
// We should use read_unaligned here:
// mk().method_call_expr(val, "read_unaligned", vec![]);
// but that interferes with `write_unaligned`

let mut val =
mk().unary_expr(UnOp::Deref(Default::default()), val);

// If the type on the other side of the pointer we are dereferencing is volatile and
// this whole expression is not an LValue, we should make this a volatile read
if lrvalue.is_rvalue() && cqual_type.qualifiers.is_volatile
{
val = self.volatile_read(val, cqual_type)?
}
Ok(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nevermind, it's already existing code.

Comment on lines +921 to +935
} else if unaligned {
// We should use read_unaligned here:
// mk().method_call_expr(val, "read_unaligned", vec![]);
// but that interferes with `write_unaligned`

let mut val =
mk().unary_expr(UnOp::Deref(Default::default()), val);

// If the type on the other side of the pointer we are dereferencing is volatile and
// this whole expression is not an LValue, we should make this a volatile read
if lrvalue.is_rvalue() && cqual_type.qualifiers.is_volatile
{
val = self.volatile_read(val, cqual_type)?
}
Ok(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in return *ptr, both ptr and *ptr are lvalues in C. See https://godbolt.org/z/bxrs7dczs. return *ptr is the rvalue. So the rvalue part of it comes not from the dereference, but from what's done with the dereference, if that makes sense. Like in *p + 1, the whole *p + 1 expression is an rvalue, but the *p itself is still an lvalue because *p is writable as well, i.e., *p = 2 is well-formed while *p + 1 = 2 is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants