Skip to content

[CBOR] - Implement RFC compliant BigInteger bytes encoding & decoding #578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 30, 2025

Conversation

iifawzi
Copy link
Contributor

@iifawzi iifawzi commented Apr 26, 2025

Hello, I discovered this issue #431 while working on the async parser for CBOR.
This PR adds feature flags to control compliant encoding & decoding behavior as described in the RFC https://datatracker.ietf.org/doc/html/rfc8949#section-3.4.3

  • For encoding: Changed calculation to -1 - n for negative values
  • For decoding:
    • Applied -1 - n formula
    • Updated to treat bytes as signed using new BigInteger(1, _binaryValue)

Changes are controlled by feature flags:

  • CBORGenerator.Feature.CORRECT_CBOR_NEGATIVE_BIGINT_ENCODING
  • CBORParser.Feature.CORRECT_CBOR_NEGATIVE_BIGINT_DECODING

Both flags default to false for backward compatibility.

Since changes are feature-flag controlled, I think they should be safe for 2.19.1 and 2.20 releases (I'm not sure about versioning after the release) @cowtowncoder I'll adjust base branch/comments based on your feedback.

@cowtowncoder
Copy link
Member

First of all: thank you for working on this!

Second of all: Rats! With SemVer, we can't really merge that in 2.19 in a patch as that changes API.
So needs to go in 2.20 -- so 2.x branch is correct.

@iifawzi
Copy link
Contributor Author

iifawzi commented Apr 27, 2025

First of all: thank you for working on this!

Second of all: Rats! With SemVer, we can't really merge that in 2.19 in a patch as that changes API. So needs to go in 2.20 -- so 2.x branch is correct.

Thank you. so I understand even if it's managed by feature flags, we consider it an API change.

I will update the comments, and will also update the tests to a more realistic case, as I realized even if it's possible to have BigInteger(-1), it's not really a big integer, and might be confusing given checking any online encoder of CBOR won't use big integer tag for -1 by default. I will include another test with an actual big integer for clarity.

edit: marking it as a draft temporarily until I do more verifications and add more tests.

Signed-off-by: Fawzi Essam <[email protected]>
@iifawzi iifawzi marked this pull request as draft April 27, 2025 12:04
@iifawzi iifawzi marked this pull request as ready for review April 27, 2025 14:34
@iifawzi
Copy link
Contributor Author

iifawzi commented Apr 27, 2025

The only point to comment on is that we're not following the preferred serialization as described https://www.rfc-editor.org/rfc/rfc8949.html#name-bignums, so encoding -340282366920938463463374607431768211456 using jackson results in:

byte[] expectedBytes = {
                (byte) 0xC3,
                (byte) 0x51, // 17 bytes - leading zero, 16 for the number
                (byte) 0x00, // LEADING Zero
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF,
                (byte) 0xFF
        };

which's still considered fine encoding as per the RFC as long as we're able to decode it Decoders that understand these tags MUST be able to decode bignums that do have leading zeroes.

Tests added in the mapper to verify we're able to decode both with/without leading zeros to the same correct value, while for backward compatibility, it remains the same, it returns different incorrect values with/without leading zeros.

EDIT: we're fine with not following the preferred way anyway, it wasn't mentioned in the initial RFC (https://www.rfc-editor.org/rfc/rfc7049#section-2.4.2), only mentioned point is that decoder should be able to decode with/without leading zeros, which's what has been achieved through this PR.

Signed-off-by: Fawzi Essam <[email protected]>
@cowtowncoder
Copy link
Member

Thank you. so I understand even if it's managed by feature flags, we consider it an API change.

More specifically: Addition of said Feature flags is an API change (functionality addition). Something to do in a minor release, but not in patch.

@cowtowncoder
Copy link
Member

Will make some minor changes wrt naming, then merge. Thank you.

One follow-up question: should we consider changing defaults for Jackson 3.0.0? I have mixed feelings about that -- on one hand, we would ideally use correct standard encoding/decoding. But then again this probably breaks handling by users unaware of changes.
But not changing it will perpetuate incorrect handling.

Copy link
Member

@cowtowncoder cowtowncoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cowtowncoder cowtowncoder merged commit 29c171a into FasterXML:2.x Apr 30, 2025
4 checks passed
@cowtowncoder
Copy link
Member

Big thank you @iifawzi for solving this! I merged it in 2.20 & forward to 3.x.
We can consider changing defaults as a follow-up step.

@iifawzi
Copy link
Contributor Author

iifawzi commented Apr 30, 2025

Will make some minor changes wrt naming, then merge. Thank you.

One follow-up question: should we consider changing defaults for Jackson 3.0.0? I have mixed feelings about that -- on one hand, we would ideally use correct standard encoding/decoding. But then again this probably breaks handling by users unaware of changes. But not changing it will perpetuate incorrect handling.

I've been thinking about while working on the async parser, on whether it should have same experience or not.

In my opinion, we should change defaults for 3.0.0, major releases meant to break compatibility (if needed), and I think in that case it's needed as current encoding leads to entirely incorrect values. However, given the consistency we had for encoding/decoding, users using Jackson for roundtrip wouldn't even notice the fix (unless they have plain assertions in binary). it will break only for users with partial usage of jackson (encoding using jackson and plain binary for decoding, or vise-versa), so I think better to default to true for 3.0.0.

If you agree, I can create a separate PR targeting 3.x

@cowtowncoder
Copy link
Member

cowtowncoder commented Apr 30, 2025

@iifawzi Yes, I concur. PR for change would be welcome. I will create Issue to match PR with.

EDIT: #582

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants