Skip to content

RUST-2028 Fix Decimal128 panic when parsing strings w/o a char boundary at idx 34 #496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 9, 2024

Conversation

arthurprs
Copy link
Contributor

Fix Decimal128 panic when parsing strings w/o a char boundary at idx 34 (Coefficient::MAX_DIGITS).

@arthurprs arthurprs force-pushed the decimal128-parse-panic branch from d50873b to 3e0dd2f Compare September 3, 2024 17:57
@abr-egn abr-egn changed the title Fix Decimal128 panic when parsing strings w/o a char boundary at idx 34 RUST-2028 Fix Decimal128 panic when parsing strings w/o a char boundary at idx 34 Sep 3, 2024
@abr-egn abr-egn self-requested a review September 3, 2024 18:19
if !s.is_char_boundary(precision) {
// a non-digit (all single byte utf8) would trigger a ParseIntError::InvalidDigit,
// so here we generate a ParseIntError::InvalidDigit kind of error too.
return Err(ParseError::InvalidCoefficient(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather have a new enum variant for ParseError (something like ParseError::Unparseable) than synthesize an error this way; that way it could also carry the source string to make debugging easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying I disagree, but no other error variants carry more information than this one.

InvalidDigit is arguably correct, as that would be the error if the split was character-based (instead of byte-based). The only reason I didn't do that is that it's more and slower code (linear vs. constant time).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could directly construct an InvalidDigit I'd have less concern. My worry is that by synthesizing it this way it'll only probably continue to be InvalidDigit; the internals of ParseIntError are private so there are no guarantees of stability. If the error changes in the future to include the offending text, for example, it'll be very confusing for downstream users of this crate to see "❤" that doesn't actually appear in their data.

Adding a new variant to ParseError avoids that issue, and if we're doing that it may as well carry the text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a variant. But also, I'd like to point out that InvalidDigit (and any variant of ParseIntError for that matter) do not carry information about the input, such as the offending character.

@isabelatkinson isabelatkinson merged commit 692cd75 into mongodb:main Sep 9, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants