[AADWARF32] Allocate a register number for CPSR #334

statham-arm · 2025-07-03T16:14:55Z

The document currently says of CPSR, along with the VFP (and FPA) control registers, "It is considered unlikely that these will be needed for producing a stack back-trace in a debugger."

However, CPSR can be required for producing a correct stack backtrace. This occurs due to conditional return sequences, for example

function:
  PUSH   {r4, r5, r6, lr}
  SUB    sp, sp, #64
  // ... do stuff ...
  CMP    this, that       // we will return early if they are equal
  ADDEQ  sp, sp, #64
  POPEQ  {r4, r5, r6, pc}
  // ... now, if we didn't return, continue using our stack frame

In between ADDEQ and POPEQ, the state of the stack depends on the flags in CPSR. If the Z flag is set, then the ADDEQ has happened, and the POPEQ is about to; if Z is clear, neither one has happened. In this example the function has no frame pointer, so the CFA is defined as an offset from sp, and what offset depends on whether we just added 64 to sp.

This style of conditional return has always been possible, since it depends only on the earliest features of the Arm instruction set. The accepted idiom in my experience has always been to write stack frame information that is valid for one case but not the other. Introducing a DWARF register number for CPSR makes it possible to write stack frame information that is valid in both cases. For example, you could write an expression along these lines, which uses the 4 bits at the top of CPSR to decide which bit of the constant bitmask to test, and then each Arm condition code is representable as a different bitmask:

  DW_OP_breg_13                    // fetch sp
  DW_OP_constu #bitmask            // bit mask for the particular condition
  DW_OP_bregx #CPSR                // fetch CPSR
  DW_OP_const1u #28
  DW_OP_shr                        // make (CPSR >> 28), just the NZCV bits
  DW_OP_shr                        // shift bit mask right by the NZCV value
  DW_OP_const1u #1
  DW_OP_and                        // AND with 1 to isolate low bit
  DW_OP_bra #label-offset-else     // if nonzero, branch over the next constant
  DW_OP_constu #offset2            // load one possible value to add to sp
  DW_OP_skip #label-offset-end     // and skip over the other constant
label-offset-else:
  DW_OP_constu #offset1            // load the other value to add to sp
label-offset-end:
  DW_OP_minus                      // subtract either offset2 or offset1 from sp

Of course, debuggers will take time to catch up. But a more interesting use case for being able to precisely describe stack situations like this is automatic static checkers for handwritten assembly language with handwritten call frame directives, which verify the semantics of the instructions against the call frame updates next to them. A debugger might almost never stop at the difficult location above, but a static checker will traverse it every time, and needs a way to avoid getting confused.

The document says of CPSR, along with the VFP (and FPA) control registers, "It is considered unlikely that these will be needed for producing a stack back-trace in a debugger." However, CPSR _can_ be required for producing a correct stack backtrace. This occurs due to conditional return sequences, for example ``` function: PUSH {r4, r5, r6, lr} SUB sp, sp, ARM-software#64 // ... do stuff ... CMP this, that // we will return early if they are equal ADDEQ sp, sp, ARM-software#64 POPEQ {r4, r5, r6, pc} // ... now, if we didn't return, continue using our stack frame ``` In between ADDEQ and POPEQ, the state of the stack depends on the flags in CPSR. If the Z flag is set, then the ADDEQ has happened, and the POPEQ is about to; if Z is clear, neither one has happened. In this example the function has no frame pointer, so the CFA is defined as an offset from sp, and _what_ offset depends on whether we just added 64 to sp. This style of conditional return has always been possible, since it depends only on the earliest features of the Arm instruction set. The accepted idiom in my experience has always been to write stack frame information that is valid for one case but not the other. Introducing a DWARF register number for CPSR makes it possible to write stack frame information that is valid in both cases. For example, you could write an expression along these lines, which uses the 4 bits at the top of CPSR to decide which bit of the constant `bitmask` to test, and then each Arm condition code is representable as a different bitmask: ``` DW_OP_breg_13 // fetch sp DW_OP_constu #bitmask // bit mask for the particular condition DW_OP_bregx #CPSR // fetch CPSR DW_OP_const1u ARM-software#28 DW_OP_shr // make (CPSR >> 28), just the NZCV bits DW_OP_shr // shift bit mask right by the NZCV value DW_OP_const1u ARM-software#1 DW_OP_and // AND with 1 to isolate low bit DW_OP_bra #label-offset-else // if nonzero, branch over the next constant DW_OP_constu #offset2 // load one possible value to add to sp DW_OP_skip #label-offset-end // and skip over the other constant label-offset-else: DW_OP_constu #offset1 // load the other value to add to sp label-offset-end: DW_OP_minus // add either offset2 or offset1 to sp ``` Of course, debuggers will take time to catch up. But a more interesting use case for being able to precisely describe stack situations like this is automatic static checkers for handwritten assembly language with handwritten call frame directives, which verify the semantics of the instructions against the call frame updates next to them. A debugger might almost never stop at the difficult location above, but a static checker will traverse it every time, and needs a way to avoid getting confused.

walkerkd

Last comment on the example code should probably be:
"subtract either offset2 or offset1 from sp"

smithp35 · 2025-07-03T16:44:43Z

Adding the register seems reasonable. We're not expecting to be adding large numbers of new registers in AArch32.

Although out of scope for AADWARF32, I'm guessing you will you need additional cfi directive(s) to construct the DWARF expression?

walkerkd · 2025-07-04T08:22:45Z

Since DWARF 3 there have been the CFI instructions DW_CFA_expression, DW_CFA_val_expression and DW_CFA_def_cfa_expression which can take DWARF expression like the one mentioned in the original comment.

However the Linux .eh_frame specification currently only currently defines support for DW_CFA_expression (it uses the DWARF 2 specification as the base standard with some DWARF 3 extensions) : https://refspecs.linuxbase.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic/dwarfext.html#DWARFEHENCODING

statham-arm · 2025-07-04T08:27:06Z

I think that's fine, because I don't know of any reason why .eh_frame would need to use an expression based on CPSR for determining the stack layout.

It only comes up in .debug_frame because a debugger might be made to stop between any two instructions (especially if you're debugging handwritten assembly, which is the class of code most likely to do tricks like this conditional return sequence in the first place).

But in .eh_frame, when you're unwinding the stack due to a thrown exception, you'd surely only need to unwind through a function at places where an exception might be thrown, which means function calls. And an ABI-compliant function call clobbers CPSR (or at least can't be trusted not to). So if you need CPSR to know your own stack layout before a function call, then on return from the call, you've forgotten it yourself, and won't even be able to clean up your own stack frame at return time.

rearnsha · 2025-07-04T09:48:07Z

I see no value for this in the stack unwinding, since the CPSR isn't preserved over a call and, in general, there's no way of knowing what result a comparison will produce at some future time. A frame that was popped in a different way that is dependent on the condition codes would be unusual in extreme and I don't think dwarf could describe that.

On the other hand, this might be useful in some expression cases for debuggers, but perhaps we would need to be clear that it does not form part of the set of registers described by unwinding.

statham-arm · 2025-07-04T10:48:15Z

Although out of scope for AADWARF32, I'm guessing you will you need additional cfi directive(s) to construct the DWARF expression?

I hadn't thought that far, but yes, I suppose so. I don't see any .cfi_foo directive in the current gas docs which constructs a DWARF expression. I think it would be possible to byte-encode one via .cfi_escape, but that's very user-unfriendly!

In fact even if you could write something using an ordinary expression syntax, along the lines of

.cfi_def_cfa_offset ((0xF5FA >> (CPSR >> 28)) & 1) ? offset_if_LE : offset_if_GT)

then it would still look pretty horrible! For practical use you'd want a custom Arm-specific directive that let you directly write a condition code like GT or LO and would construct the right expression.

walkerkd · 2025-07-04T11:13:56Z

@rearnsha raises a valid point here in that as the CPSR is not caller preserved, any expression using the CPSR register would be unevaluatable in any but the top stack frames as the CPSR register location would be "undefined".

statham-arm · 2025-07-07T07:50:31Z

Ah, yes, I see what you mean. Should I add some text along the lines of "CPSR is only for use in expressions, when the stack layout depends on it, and you aren't allowed to write a CFI rule to say where it's stored, because it never is"?

Mind you, this is surely also very like r0–r3: those too aren't supposed to have CFI rules saying where the caller's version lives, but you can use them in CFI anyway, e.g. you could (if you really wanted to) make the CFA based on r3.

rearnsha · 2025-07-07T10:57:32Z

It's correct that r0-r3 (and IP) can't be used in unwinding (with the AAPCS conventions); the same applies to d0-d7 and d16-d31.
Of course, you could envisage ABI variants where these might become call-saved and they can certainly all be used in expressions (eg for 'virtual' variables). On that basis I think there's some scope for adding the CPSR.
But that raises further questions, such as should the FPSR also be made available (because FP comparisons don't directly write the CPSR)? What about other status bits?

statham-arm · 2025-07-07T12:26:19Z

Surely the only ABI variants we need to care about for these purposes are ones specified in this collection of ABI documents itself. People making up their own variants outside this specification (if any) shouldn't be surprised when this spec doesn't allocate them the DWARF register numbers they need 🙂 So we can add a register number for FPSR if and when we introduce an ABI variant that specifies it as callee-saved.

But one for CPSR is useful now, even though we don't specify it as callee-saved, because in the deepest stack frame it's possible for the stack layout to depend on it.

I can't see how FPSR, or other status bits such as the MVE predicate register (or even the Q flag), could easily have the same property, because only the NZCV flags in CPSR can be used to conditionalize instructions that change the state of the stack (pops, pushes, modifications to sp). To do conditional stack modification based on any other flags, you'd have to transfer those flags into the CPSR, or test them in some other way that ends up in a conditional branch (like putting them into an integer register and doing a TBNZ), and then the stack layout is back to depending on PC and/or CPSR again.

rearnsha · 2025-07-07T13:13:05Z

        vcmpe.f32       s0, s1                   // <- debugger is here
        vmrs    APSR_nzcv, FPSCR
        addeq  sp, sp, #64
        popeq  {r4, r5, r6, pc}

So at the point of the comparison, only the FPSR contains the correct value.

statham-arm · 2025-07-07T13:20:42Z

But at the point where only the FPSR contains the result of the comparison, the stack layout doesn't depend on it. The stack layout is independent of the comparison result until after the addeq, and by that time, the flags it depends on are in CPSR, because that's how the addeq managed to anything conditional at all.

Directly after the vcmpe.f32, FPSR tells you how the stack is going to be different in two instructions' time, but the debugger doesn't need to know that, it only needs to know what it's like now.

rearnsha · 2025-07-07T13:55:48Z

And when I reach

popeq  {r4, r5, r6, pc}

The CPSR only tells me what it's going to be in one instruction time. But I don't need dwarf information to tell me that, I could work it out from the instruction and the current register set itself.

So what's your point exactly?
Dwarf expressions are really intended for things like

 for (int i = 0; i < 100; i++)
   a[i] = 3;

which could be compiled to (assuming a is in r1)

  mov r0, #3
  add r2, r1, #400  // termination value
L0:
  str r0, [r1], #4
  cmp r1, r2
  blt L0

now the compiler can describe i as 100 - ((r2 - r1) / 4) to allow it to reconstruct the user variable after optimization. Absent such a description the debugger can only say that it doesn't know what value i has.

statham-arm · 2025-07-07T14:05:21Z

The CPSR only tells me what it's going to be in one instruction time.

No – not only that. At the point in between the addeq and the popeq, the CPSR tells me what the stack is like now, because it tells me what the addeq has already done as well as what the popeq is about to do.

If Z is set, then it was also set before the addeq, so the addeq executed, and added 64 to the previous value of sp. If Z is clear now, then it was clear before, so the addeq didn't do that. So the locations of the saved registers relative to the current value of sp are at offsets that depend on CPSR.

Dwarf expressions are really intended for things like [not this]

In that case, how would you prefer that a situation like this be represented in .debug_frame?

walkerkd reviewed Jul 3, 2025

View reviewed changes

[AADWARF32] Allocate a register number for CPSR #334

Are you sure you want to change the base?

[AADWARF32] Allocate a register number for CPSR #334

Uh oh!

Conversation

statham-arm commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

walkerkd left a comment

Choose a reason for hiding this comment

Uh oh!

smithp35 commented Jul 3, 2025

Uh oh!

walkerkd commented Jul 4, 2025

Uh oh!

statham-arm commented Jul 4, 2025

Uh oh!

rearnsha commented Jul 4, 2025

Uh oh!

statham-arm commented Jul 4, 2025

Uh oh!

walkerkd commented Jul 4, 2025

Uh oh!

statham-arm commented Jul 7, 2025

Uh oh!

rearnsha commented Jul 7, 2025

Uh oh!

statham-arm commented Jul 7, 2025

Uh oh!

rearnsha commented Jul 7, 2025

Uh oh!

statham-arm commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rearnsha commented Jul 7, 2025

Uh oh!

statham-arm commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

statham-arm commented Jul 3, 2025 •

edited

Loading

statham-arm commented Jul 7, 2025 •

edited

Loading