Skip to content

Conversation

@statham-arm
Copy link
Contributor

@statham-arm statham-arm commented Jul 3, 2025

The document currently says of CPSR, along with the VFP (and FPA) control registers, "It is considered unlikely that these will be needed for producing a stack back-trace in a debugger."

However, CPSR can be required for producing a correct stack backtrace. This occurs due to conditional return sequences, for example

function:
  PUSH   {r4, r5, r6, lr}
  SUB    sp, sp, #64
  // ... do stuff ...
  CMP    this, that       // we will return early if they are equal
  ADDEQ  sp, sp, #64
  POPEQ  {r4, r5, r6, pc}
  // ... now, if we didn't return, continue using our stack frame

In between ADDEQ and POPEQ, the state of the stack depends on the flags in CPSR. If the Z flag is set, then the ADDEQ has happened, and the POPEQ is about to; if Z is clear, neither one has happened. In this example the function has no frame pointer, so the CFA is defined as an offset from sp, and what offset depends on whether we just added 64 to sp.

This style of conditional return has always been possible, since it depends only on the earliest features of the Arm instruction set. The accepted idiom in my experience has always been to write stack frame information that is valid for one case but not the other. Introducing a DWARF register number for CPSR makes it possible to write stack frame information that is valid in both cases. For example, you could write an expression along these lines, which uses the 4 bits at the top of CPSR to decide which bit of the constant bitmask to test, and then each Arm condition code is representable as a different bitmask:

  DW_OP_breg_13                    // fetch sp
  DW_OP_constu #bitmask            // bit mask for the particular condition
  DW_OP_bregx #CPSR                // fetch CPSR
  DW_OP_const1u #28
  DW_OP_shr                        // make (CPSR >> 28), just the NZCV bits
  DW_OP_shr                        // shift bit mask right by the NZCV value
  DW_OP_const1u #1
  DW_OP_and                        // AND with 1 to isolate low bit
  DW_OP_bra #label-offset-else     // if nonzero, branch over the next constant
  DW_OP_constu #offset2            // load one possible value to add to sp
  DW_OP_skip #label-offset-end     // and skip over the other constant
label-offset-else:
  DW_OP_constu #offset1            // load the other value to add to sp
label-offset-end:
  DW_OP_minus                      // subtract either offset2 or offset1 from sp

Of course, debuggers will take time to catch up. But a more interesting use case for being able to precisely describe stack situations like this is automatic static checkers for handwritten assembly language with handwritten call frame directives, which verify the semantics of the instructions against the call frame updates next to them. A debugger might almost never stop at the difficult location above, but a static checker will traverse it every time, and needs a way to avoid getting confused.

The document says of CPSR, along with the VFP (and FPA) control
registers, "It is considered unlikely that these will be needed for
producing a stack back-trace in a debugger."

However, CPSR _can_ be required for producing a correct stack
backtrace. This occurs due to conditional return sequences, for
example

```
function:
  PUSH   {r4, r5, r6, lr}
  SUB    sp, sp, ARM-software#64
  // ... do stuff ...
  CMP    this, that       // we will return early if they are equal
  ADDEQ  sp, sp, ARM-software#64
  POPEQ  {r4, r5, r6, pc}
  // ... now, if we didn't return, continue using our stack frame
```

In between ADDEQ and POPEQ, the state of the stack depends on the
flags in CPSR. If the Z flag is set, then the ADDEQ has happened, and
the POPEQ is about to; if Z is clear, neither one has happened. In
this example the function has no frame pointer, so the CFA is defined
as an offset from sp, and _what_ offset depends on whether we just
added 64 to sp.

This style of conditional return has always been possible, since it
depends only on the earliest features of the Arm instruction set. The
accepted idiom in my experience has always been to write stack frame
information that is valid for one case but not the other. Introducing
a DWARF register number for CPSR makes it possible to write stack
frame information that is valid in both cases. For example, you could
write an expression along these lines, which uses the 4 bits at the
top of CPSR to decide which bit of the constant `bitmask` to test, and
then each Arm condition code is representable as a different bitmask:

```
  DW_OP_breg_13                    // fetch sp
  DW_OP_constu #bitmask            // bit mask for the particular condition
  DW_OP_bregx #CPSR                // fetch CPSR
  DW_OP_const1u ARM-software#28
  DW_OP_shr                        // make (CPSR >> 28), just the NZCV bits
  DW_OP_shr                        // shift bit mask right by the NZCV value
  DW_OP_const1u ARM-software#1
  DW_OP_and                        // AND with 1 to isolate low bit
  DW_OP_bra #label-offset-else     // if nonzero, branch over the next constant
  DW_OP_constu #offset2            // load one possible value to add to sp
  DW_OP_skip #label-offset-end     // and skip over the other constant
label-offset-else:
  DW_OP_constu #offset1            // load the other value to add to sp
label-offset-end:
  DW_OP_minus                      // add either offset2 or offset1 to sp
```

Of course, debuggers will take time to catch up. But a more
interesting use case for being able to precisely describe stack
situations like this is automatic static checkers for handwritten
assembly language with handwritten call frame directives, which verify
the semantics of the instructions against the call frame updates next
to them. A debugger might almost never stop at the difficult location
above, but a static checker will traverse it every time, and needs a
way to avoid getting confused.
Copy link

@walkerkd walkerkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comment on the example code should probably be:
"subtract either offset2 or offset1 from sp"

@smithp35
Copy link
Contributor

smithp35 commented Jul 3, 2025

Adding the register seems reasonable. We're not expecting to be adding large numbers of new registers in AArch32.

Although out of scope for AADWARF32, I'm guessing you will you need additional cfi directive(s) to construct the DWARF expression?

@walkerkd
Copy link

walkerkd commented Jul 4, 2025

Since DWARF 3 there have been the CFI instructions DW_CFA_expression, DW_CFA_val_expression and DW_CFA_def_cfa_expression which can take DWARF expression like the one mentioned in the original comment.

However the Linux .eh_frame specification currently only currently defines support for DW_CFA_expression (it uses the DWARF 2 specification as the base standard with some DWARF 3 extensions) : https://refspecs.linuxbase.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic/dwarfext.html#DWARFEHENCODING

@statham-arm
Copy link
Contributor Author

I think that's fine, because I don't know of any reason why .eh_frame would need to use an expression based on CPSR for determining the stack layout.

It only comes up in .debug_frame because a debugger might be made to stop between any two instructions (especially if you're debugging handwritten assembly, which is the class of code most likely to do tricks like this conditional return sequence in the first place).

But in .eh_frame, when you're unwinding the stack due to a thrown exception, you'd surely only need to unwind through a function at places where an exception might be thrown, which means function calls. And an ABI-compliant function call clobbers CPSR (or at least can't be trusted not to). So if you need CPSR to know your own stack layout before a function call, then on return from the call, you've forgotten it yourself, and won't even be able to clean up your own stack frame at return time.

@rearnsha
Copy link

rearnsha commented Jul 4, 2025

I see no value for this in the stack unwinding, since the CPSR isn't preserved over a call and, in general, there's no way of knowing what result a comparison will produce at some future time. A frame that was popped in a different way that is dependent on the condition codes would be unusual in extreme and I don't think dwarf could describe that.

On the other hand, this might be useful in some expression cases for debuggers, but perhaps we would need to be clear that it does not form part of the set of registers described by unwinding.

@statham-arm
Copy link
Contributor Author

Although out of scope for AADWARF32, I'm guessing you will you need additional cfi directive(s) to construct the DWARF expression?

I hadn't thought that far, but yes, I suppose so. I don't see any .cfi_foo directive in the current gas docs which constructs a DWARF expression. I think it would be possible to byte-encode one via .cfi_escape, but that's very user-unfriendly!

In fact even if you could write something using an ordinary expression syntax, along the lines of

.cfi_def_cfa_offset ((0xF5FA >> (CPSR >> 28)) & 1) ? offset_if_LE : offset_if_GT)

then it would still look pretty horrible! For practical use you'd want a custom Arm-specific directive that let you directly write a condition code like GT or LO and would construct the right expression.

@walkerkd
Copy link

walkerkd commented Jul 4, 2025

@rearnsha raises a valid point here in that as the CPSR is not caller preserved, any expression using the CPSR register would be unevaluatable in any but the top stack frames as the CPSR register location would be "undefined".

@statham-arm
Copy link
Contributor Author

Ah, yes, I see what you mean. Should I add some text along the lines of "CPSR is only for use in expressions, when the stack layout depends on it, and you aren't allowed to write a CFI rule to say where it's stored, because it never is"?

Mind you, this is surely also very like r0–r3: those too aren't supposed to have CFI rules saying where the caller's version lives, but you can use them in CFI anyway, e.g. you could (if you really wanted to) make the CFA based on r3.

@rearnsha
Copy link

rearnsha commented Jul 7, 2025

It's correct that r0-r3 (and IP) can't be used in unwinding (with the AAPCS conventions); the same applies to d0-d7 and d16-d31.
Of course, you could envisage ABI variants where these might become call-saved and they can certainly all be used in expressions (eg for 'virtual' variables). On that basis I think there's some scope for adding the CPSR.
But that raises further questions, such as should the FPSR also be made available (because FP comparisons don't directly write the CPSR)? What about other status bits?

@statham-arm
Copy link
Contributor Author

Surely the only ABI variants we need to care about for these purposes are ones specified in this collection of ABI documents itself. People making up their own variants outside this specification (if any) shouldn't be surprised when this spec doesn't allocate them the DWARF register numbers they need 🙂 So we can add a register number for FPSR if and when we introduce an ABI variant that specifies it as callee-saved.

But one for CPSR is useful now, even though we don't specify it as callee-saved, because in the deepest stack frame it's possible for the stack layout to depend on it.

I can't see how FPSR, or other status bits such as the MVE predicate register (or even the Q flag), could easily have the same property, because only the NZCV flags in CPSR can be used to conditionalize instructions that change the state of the stack (pops, pushes, modifications to sp). To do conditional stack modification based on any other flags, you'd have to transfer those flags into the CPSR, or test them in some other way that ends up in a conditional branch (like putting them into an integer register and doing a TBNZ), and then the stack layout is back to depending on PC and/or CPSR again.

@rearnsha
Copy link

rearnsha commented Jul 7, 2025

        vcmpe.f32       s0, s1                   // <- debugger is here
        vmrs    APSR_nzcv, FPSCR
        addeq  sp, sp, #64
        popeq  {r4, r5, r6, pc}

So at the point of the comparison, only the FPSR contains the correct value.

@statham-arm
Copy link
Contributor Author

statham-arm commented Jul 7, 2025

But at the point where only the FPSR contains the result of the comparison, the stack layout doesn't depend on it. The stack layout is independent of the comparison result until after the addeq, and by that time, the flags it depends on are in CPSR, because that's how the addeq managed to anything conditional at all.

Directly after the vcmpe.f32, FPSR tells you how the stack is going to be different in two instructions' time, but the debugger doesn't need to know that, it only needs to know what it's like now.

@rearnsha
Copy link

rearnsha commented Jul 7, 2025

And when I reach

popeq  {r4, r5, r6, pc}

The CPSR only tells me what it's going to be in one instruction time. But I don't need dwarf information to tell me that, I could work it out from the instruction and the current register set itself.

So what's your point exactly?
Dwarf expressions are really intended for things like

 for (int i = 0; i < 100; i++)
   a[i] = 3;

which could be compiled to (assuming a is in r1)

  mov r0, #3
  add r2, r1, #400  // termination value
L0:
  str r0, [r1], #4
  cmp r1, r2
  blt L0

now the compiler can describe i as 100 - ((r2 - r1) / 4) to allow it to reconstruct the user variable after optimization. Absent such a description the debugger can only say that it doesn't know what value i has.

@statham-arm
Copy link
Contributor Author

The CPSR only tells me what it's going to be in one instruction time.

No – not only that. At the point in between the addeq and the popeq, the CPSR tells me what the stack is like now, because it tells me what the addeq has already done as well as what the popeq is about to do.

If Z is set, then it was also set before the addeq, so the addeq executed, and added 64 to the previous value of sp. If Z is clear now, then it was clear before, so the addeq didn't do that. So the locations of the saved registers relative to the current value of sp are at offsets that depend on CPSR.

Dwarf expressions are really intended for things like [not this]

In that case, how would you prefer that a situation like this be represented in .debug_frame?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants