Document length limits for which functions are guaranteed to produce the correct result

Currently `Encoding::mut_len` uses arithmetic that can overflow causing `panic`s in debug builds and incorrect results on release builds. When code does not `panic`, this then affects other functions like `Encoding::encode` where the resulting encoded value is not correct. Admittedly this likely only affects 16-bit and 32-bit platforms; and furthermore, only "large" inputs on said platforms. Such possibilities do exist though.

For example on a 32-bit platform compiled in release mode, the following code does not `panic`:

```rust
use data_encoding::BASE64URL_NOPAD;
fn main() {
    let input = vec![0u8; 1_073_741_824];
    assert!(BASE64URL_NOPAD.encode(input.as_slice()).is_empty());
}
```

This is clearly incorrect though as the resulting `String` should have `len` of `1_431_655_766` which is possible to allocate for both Rust—Rust limits allocations to `isize::MAX`—and several OSes on 32-bt platforms which typically limits addressable memory in user space to 2 or 3 GiB.

Ideally the calculation done would be free from overflow at least when constrained to `isize::MAX` which as stated is the maximum allocation Rust allows for, but this may not be feasible. For example the below function is free from overflow, but it only works for base64url without padding:

```rust
const fn base64url_nopad_len(n: usize) -> usize {
    const MAX_LEN: usize = isize::MAX as usize;
    assert!(n <= MAX_LEN);
    ((n / 3) << 2) + ((n % 3) << 2).div_ceil(3)
}
```

Accepting that such an implementation may not be possible, it would be nice if the documentation for the encoding functions explicitly called out this possibility. If this is too much of an edge case though, then I understand.

I don't actually have access to an i686 machine, so I can't test the above code. I'm merely generalizing based on the behavior on my x64 machine; however causing `Encoding::encode` to produce the incorrect result without a `panic` on my machine is effectively impossible due to the amount of RAM necessary. The following code does not `panic` when compiled on release on my x64 machine though:

```rust
use data_encoding::BASE64URL_NOPAD;
fn main() {
    const INPUT_LEN: usize = (isize::MAX as usize >> 2) + 1;
    assert_eq!(0, BASE64URL_NOPAD.encode_len(INPUT_LEN));
}
```

Perhaps even more unfortunate is that `Encoding::decode_len` is also susceptible to overflow despite the fact that the returned length should almost always be less than the input length. Making such a function free from overflow may be easier so long as operations like division are done first:

```rust
const fn base64url_nopad_decode_len(n: usize) -> Option<usize> {
    let rem = n & 3;
    if rem == 1 {
        None
    } else {
        Some((3 * (n >> 2)) + ((3 * rem) >> 2))
    }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document length limits for which functions are guaranteed to produce the correct result #145

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Document length limits for which functions are guaranteed to produce the correct result #145

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions