Description
There was a post recently on /r/rust about microbenchmarks in different programming languages (https://github.com/kostya/benchmarks), and I was trying to understand how the Rust implementation works.
I noticed that in the Base64 benchmark the author used this code to construct a srting with str_size
copies of a character:
let str_size = 10000000;
let mut str: String = "".to_string();
for _ in 0..str_size { str.push_str("a"); }
I thought that push_str is a bit inefficient for a single character, so I looked at the source code. I started playing with String::push and found a trick in the implemetation for ASCII characters (relevant PR: #20079).
The actual problem:
If the character is 1 byte long in UTF-8, it is treated as an ASCII character (as in C), but if it's bigger than 1 byte, the algorythm always reserves 4 bytes in the vector even if the pushed character's length is only 2 or 3 bytes. So even if the vector has just enough space for one of these smaller characters, the vector reallocates its storage (and doubles the size of the buffer). As a result, the string now has an unnecessarily doubled capacity.
I know it doesn't seem like a big problem, but if someone uses push somehow with non-English text, it might become a significant memory overhead. The issue happens if the length of the string becomes a power of 2, so strings with smaller length are more affected.
A possible solution:
// This is a modified version of the original
pub fn push(&mut self, ch: char) {
match ch.len_utf8() {
1 => self.vec.push(ch as u8),
ch_len => {
let cur_len = self.len();
self.vec.reserve(ch_len);
unsafe {
// Attempt to not use an intermediate buffer by just pushing bytes
// directly onto this string.
let slice = slice::from_raw_parts_mut (
self.vec.as_mut_ptr().offset(cur_len as isize),
ch_len
);
let used = ch.encode_utf8(slice).unwrap_or(0);
self.vec.set_len(cur_len + used);
}
}
}
}
I benchmarked this version, therer was no measurable difference.
There's one problem though: char::len_utf8 is not marked with #[inline], so without LTO, it's currently slower.