Skip to content

API additions to allow enum value introspection#1

Closed
leovitch wants to merge 6 commits intothomasvl:masterfrom
leovitch:master
Closed

API additions to allow enum value introspection#1
leovitch wants to merge 6 commits intothomasvl:masterfrom
leovitch:master

Conversation

@leovitch
Copy link

Here's a proposal for a minimal API to allow introspecting the list of enum values. Let me know if you have suggestions.

leovitch and others added 4 commits May 21, 2018 10:11
Sync to thomasvl/protobuf head
…lues.

Refactored implementation so that this contains a minimum of added code.

Clarified comments regarding behavior in the presence of the alias_allowed option.

Added unit tests for new functionality and for the alias case.
@@ -834,9 +834,7 @@ - (NSString *)enumNameForValue:(int32_t)number {

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take out the call to calcValueNameOffsets now since this isn't directly using nameOffsets_ any more

NEG = -1; // Intentionally negative.
}

enum NestedEnumAllowingAlias {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try not to modify the core protos since the c++ tests drive them, looks like you can use https://github.com/google/protobuf/blob/master/src/google/protobuf/unittest.proto#L518-L526 instead.

@thomasvl
Copy link
Owner

Once things are fixed up, go ahead and send it over to google/protobuf, and I can land it.

@leovitch
Copy link
Author

leovitch commented May 25, 2018 via email

@leovitch
Copy link
Author

leovitch commented May 25, 2018 via email

@thomasvl thomasvl closed this May 25, 2018
thomasvl pushed a commit that referenced this pull request Nov 4, 2019
thomasvl pushed a commit that referenced this pull request Dec 4, 2023
thomasvl pushed a commit that referenced this pull request Jan 8, 2024
Before, every charAt would emit (on android):
```
    0x00002104    adrp x17, #+0x1000 (addr 0x3000)
    0x00002108    ldr w17, [x17, protocolbuffers#20]
    0x0000210c    ldr x0, [x0, protocolbuffers#128]
    0x00002110    ldr x0, [x0, protocolbuffers#328]
    0x00002114    ldr lr, [x0, protocolbuffers#24]
    0x00002118    blr lr <-- Call into String.charAt(int)
```
Now, it emits the inlined implementation of charAt (branch is for possibly compressed strings):
```
    0x000020b4    ldur w16, [x4, #-8]
    0x000020b8    tbnz w16, #0, #+0xc (addr 0x20c4)
    0x000020bc    ldrb w4, [x4, x0]
    0x000020c0    b #+0x8 (addr 0x20c8)
    0x000020c4    ldrh w4, [x4, x0, lsl #1]
```

PiperOrigin-RevId: 591147406
thomasvl pushed a commit that referenced this pull request Aug 22, 2024
PiperOrigin-RevId: 663974332
thomasvl pushed a commit that referenced this pull request Jan 2, 2025
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path.

Loop body before:
```
.LBB0_2:
        add     w8, w12, #1
        cmp     w8, w11
        b.gt    .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
        add     w12, w8, w11
        add     w12, w12, w12, lsr protocolbuffers#31
        asr     w12, w12, #1
        smaddl  x0, w12, w10, x9
        ldr     w13, [x0]
        cmp     w13, w1
        b.lo    .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
        b.ls    .LBB0_7 // Predictable branch - ends the loop
        sub     w11, w12, #1
        cmp     w8, w11
        b.le    .LBB0_3 // Predictable branch - continues the loop
```

Loop body after:
```
.LBB7_1:
        cmp     w9, w11
        b.hi    .LBB7_4 // Predictable branch - ends the loop
        add     w12, w9, w11
        lsr     w12, w12, #1
        umaddl  x0, w12, w8, x10
        sub     w14, w12, #1
        ldr     w13, [x0]
        cmp     w13, w1
        csel    w11, w14, w11, hs
        csinc   w9, w9, w12, hs
        b.ne    .LBB7_1 // Predictable branch - continues the loop
```

PiperOrigin-RevId: 703214356
thomasvl pushed a commit that referenced this pull request Jun 11, 2025
There is no need to keep these inline as they are not specialized to the function.  We also don't need branch prediction slots for them as the `ret` will always be correctly predicted, and the "end of buffer" branch is rarely taken.

With these changes, the functions to parse fixed fields are about as short and optimal as they could be:

<details>

<summary>x86-64 Assembly</summary>

```
0000000000000000 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte>:
       0: 45 84 c9                      test    r9b, r9b
       3: 0f 85 00 00 00 00             jne     0x9 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x9>
                0000000000000005:  R_X86_64_PLT32       _upb_FastDecoder_DecodeGeneric-0x4
       9: 4c 89 c8                      mov     rax, r9
       c: 48 c1 e8 30                   shr     rax, 0x30
      10: 45 89 ca                      mov     r10d, r9d
      13: 41 c1 ea 18                   shr     r10d, 0x18
      17: 4d 0f ab d0                   bts     r8, r10
      1b: 44 8b 56 01                   mov     r10d, dword ptr [rsi + 0x1]
      1f: 44 89 14 02                   mov     dword ptr [rdx + rax], r10d
      23: 48 83 c6 05                   add     rsi, 0x5
      27: 48 3b 77 08                   cmp     rsi, qword ptr [rdi + 0x8]
      2b: 0f 83 00 00 00 00             jae     0x31 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x31>
                000000000000002d:  R_X86_64_PLT32       upb_DecodeFast_MessageIsDoneFallback-0x4
      31: 0f b7 06                      movzx   eax, word ptr [rsi]
      34: 49 89 c9                      mov     r9, rcx
      37: 49 c1 f9 08                   sar     r9, 0x8
      3b: 41 89 c2                      mov     r10d, eax
      3e: 41 21 ca                      and     r10d, ecx
      41: 41 81 e2 f8 00 00 00          and     r10d, 0xf8
      48: 4f 8b 5c 51 20                mov     r11, qword ptr [r9 + 2*r10 + 0x20]
      4d: 4f 8b 4c 51 18                mov     r9, qword ptr [r9 + 2*r10 + 0x18]
      52: 49 31 c1                      xor     r9, rax
      55: 41 ff e3                      jmp     r11
```

</details>

<details>

<summary>ARM64 Assembly</summary>

```
0000000000000000 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte>:
       0: f2401cbf      tst     x5, #0xff
       4: 54000040      b.eq    0xc <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0xc>
       8: 14000000      b       0x8 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x8>
                0000000000000008:  R_AARCH64_JUMP26     _upb_FastDecoder_DecodeGeneric
       c: d370fca8      lsr     x8, x5, protocolbuffers#48
      10: b840102      ldur    w9, [x1, #0x1]
      14: 5280002a      mov     w10, #0x1               // =1
      18: 91001421      add     x1, x1, #0x5
      1c: b8286849      str     w9, [x2, x8]
      20: d358fca8      lsr     x8, x5, protocolbuffers#24
      24: f9400409      ldr     x9, [x0, #0x8]
      28: 9ac82148      lsl     x8, x10, x8
      2c: eb01013f      cmp     x9, x1
      30: 54000149      b.ls    0x58 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x58>
      34: 79400029      ldrh    w9, [x1]
      38: 9348fc6a      asr     x10, x3, protocolbuffers#8
      3c: 12001c6b      and     w11, w3, #0xff
      40: aa040104      orr     x4, x8, x4
      44: 8a09016b      and     x11, x11, x9
      48: 8b0b054a      add     x10, x10, x11, lsl #1
      4c: a941994b      ldp     x11, x6, [x10, #0x18]
      50: ca090165      eor     x5, x11, x9
      54: d61f00c0      br      x6
      58: aa040104      orr     x4, x8, x4
      5c: 14000000      b       0x5c <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x5c>
                000000000000005c:  R_AARCH64_JUMP26     upb_DecodeFast_MessageIsDoneFallback
```

</details>

PiperOrigin-RevId: 765632464
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants