Skip to content

Document length limits for which functions are guaranteed to produce the correct result #145

@zacknewman

Description

@zacknewman

Currently Encoding::mut_len uses arithmetic that can overflow causing panics in debug builds and incorrect results on release builds. When code does not panic, this then affects other functions like Encoding::encode where the resulting encoded value is not correct. Admittedly this likely only affects 16-bit and 32-bit platforms; and furthermore, only "large" inputs on said platforms. Such possibilities do exist though.

For example on a 32-bit platform compiled in release mode, the following code does not panic:

use data_encoding::BASE64URL_NOPAD;
fn main() {
    let input = vec![0u8; 1_073_741_824];
    assert!(BASE64URL_NOPAD.encode(input.as_slice()).is_empty());
}

This is clearly incorrect though as the resulting String should have len of 1_431_655_766 which is possible to allocate for both Rust—Rust limits allocations to isize::MAX—and several OSes on 32-bt platforms which typically limits addressable memory in user space to 2 or 3 GiB.

Ideally the calculation done would be free from overflow at least when constrained to isize::MAX which as stated is the maximum allocation Rust allows for, but this may not be feasible. For example the below function is free from overflow, but it only works for base64url without padding:

const fn base64url_nopad_len(n: usize) -> usize {
    const MAX_LEN: usize = isize::MAX as usize;
    assert!(n <= MAX_LEN);
    ((n / 3) << 2) + ((n % 3) << 2).div_ceil(3)
}

Accepting that such an implementation may not be possible, it would be nice if the documentation for the encoding functions explicitly called out this possibility. If this is too much of an edge case though, then I understand.

I don't actually have access to an i686 machine, so I can't test the above code. I'm merely generalizing based on the behavior on my x64 machine; however causing Encoding::encode to produce the incorrect result without a panic on my machine is effectively impossible due to the amount of RAM necessary. The following code does not panic when compiled on release on my x64 machine though:

use data_encoding::BASE64URL_NOPAD;
fn main() {
    const INPUT_LEN: usize = (isize::MAX as usize >> 2) + 1;
    assert_eq!(0, BASE64URL_NOPAD.encode_len(INPUT_LEN));
}

Perhaps even more unfortunate is that Encoding::decode_len is also susceptible to overflow despite the fact that the returned length should almost always be less than the input length. Making such a function free from overflow may be easier so long as operations like division are done first:

const fn base64url_nopad_decode_len(n: usize) -> Option<usize> {
    let rem = n & 3;
    if rem == 1 {
        None
    } else {
        Some((3 * (n >> 2)) + ((3 * rem) >> 2))
    }
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions