-
Notifications
You must be signed in to change notification settings - Fork 334
zstd: Improve best encoder by extending backwards #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// We end the search early, so we don't risk 0 literals | ||
// and have to do special offset treatment. | ||
startLimit := nextEmit + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I don't fully understand, because the other left-extending code below did not have this. I omitted it in the new code, and all the tests pass, also with debugAsserts = true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeat offsets are slightly different when there are 0 literals.
There is an exception though, when current sequence's literals_length = 0. In this case, repeated offsets are shifted by one, so an offset_value of 1 means Repeated_Offset2, an offset_value of 2 means Repeated_Offset3, and an offset_value of 3 means Repeated_Offset1 - 1_byte.
This mean if we extend back so there are no literals the repeat values will be wrong or must be adjusted.
However, I believe that the s > nextEmit
(and not >=
) takes care of that.
You are however welcome to try to improve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking against nextEmit+1, which seemed the safest option, increases the output size by 2.6% (!). I've added an assertion instead. Please check if it's in the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When fully supporting these below, we can remove the check.
I may have checked against an older version prior to v1.15.13 by accident. Either way to big deal if we bring it back. |
The SpeedBestCompression encoder now extends matches backwards before estimating their encoded size, rather than doing this after selecting the best match. This is a bit slower, but produces smaller output. Benchmarks on amd64: name old speed new speed delta Encoder_EncodeAllSimple/best-8 20.7MB/s ± 3% 19.0MB/s ± 1% -8.04% (p=0.000 n=19+18) Encoder_EncodeAllSimple4K/best-8 19.2MB/s ± 6% 17.9MB/s ± 1% -6.86% (p=0.000 n=20+20) Output sizes on Silesia and enwik9: dickens 3220994 3179697 (× 0.987179) enwik9 259846164 257481474 (× 0.990900) mozilla 16912437 16895142 (× 0.998977) mr 3502823 3473770 (× 0.991706) nci 2306320 2300580 (× 0.997511) ooffice 2896907 2888715 (× 0.997172) osdb 3390548 3368411 (× 0.993471) reymont 1657380 1639490 (× 0.989206) samba 4329898 4315020 (× 0.996564) sao 5416648 5383855 (× 0.993946) webster 9972808 9887560 (× 0.991452) xml 542277 541018 (× 0.997678) x-ray 5733121 5681186 (× 0.990941) total 319728325 317035918 (× 0.991579) Wall clock time for compressing enwik9 goes up a bit, but is still close to what is was before klauspost#776.
720f5d2
to
194a8db
Compare
Interestingly I see a regression on github-june-2days-2019.json and github-ranks-backup.bin from here: https://files.klauspost.com/compress/ We can rewrite the repeat checks to be: if canRepeat && best.length < goodEnough {
if s == nextEmit {
improve(&best, s-offset2, s, uint32(cv), 1|8)
improve(&best, s-offset3, s, uint32(cv), 2|8)
if offset1 > 1 {
improve(&best, s-(offset1-1), s, uint32(cv), 3|8)
} }
// If either no match or a non-repeat match, check at + 1
if best.rep <= 0 {
cv32 := uint32(cv >> 8)
spp := s + 1
improve(&best, spp-offset1, spp, cv32, 1)
improve(&best, spp-offset2, spp, cv32, 2)
improve(&best, spp-offset3, spp, cv32, 3)
if best.rep < 0 {
cv32 = uint32(cv >> 24)
spp += 2
improve(&best, spp-offset1, spp, cv32, 1)
improve(&best, spp-offset2, spp, cv32, 2)
improve(&best, spp-offset3, spp, cv32, 3)
}
}
}
...
seq.offset = uint32(best.rep & 7)
...
switch best.rep {
case 2, 8 | 1:
offset1, offset2 = offset2, offset1
case 3, 8 | 2:
offset1, offset2, offset3 = offset3, offset1, offset2
case 8 | 3:
offset1, offset2, offset3 = offset1-1, offset1, offset2
}
You can then remove the entire Another minor tweak is You are welcome to also set
|
(sent it a bit early - fixed it above) With these I get a good improvement across the board. |
Ran your proposal with these changes across the testset and results looks good. |
Closing in favor of #784. |
In #776, the compression quality of the zstd SpeedBestCompression encoder decreased, although you may have misunderstood that. In any case, here's a patch that improves it quite a bit, at the expense of some of the speed gain from #776. The final output sizes are smaller than they were prior to that PR.
The encoder now extends matches backwards before estimating their encoded size, rather than doing this after selecting the best match based on the partial matches gotten from only looking forward from the initial match offset.
Benchmarks on amd64:
Output sizes on Silesia and enwik9:
Wall clock time for compressing enwik9 goes up a bit, but is still close to what is was before #776.