zstd: Improve best encoder by extending backwards #783

greatroar · 2023-03-19T09:16:46Z

In #776, the compression quality of the zstd SpeedBestCompression encoder decreased, although you may have misunderstood that. In any case, here's a patch that improves it quite a bit, at the expense of some of the speed gain from #776. The final output sizes are smaller than they were prior to that PR.

The encoder now extends matches backwards before estimating their encoded size, rather than doing this after selecting the best match based on the partial matches gotten from only looking forward from the initial match offset.

Benchmarks on amd64:

name                              old speed      new speed      delta
Encoder_EncodeAllSimple/best-8    20.7MB/s ± 3%  19.0MB/s ± 1%  -8.04%  (p=0.000 n=19+18)
Encoder_EncodeAllSimple4K/best-8  19.2MB/s ± 6%  17.9MB/s ± 1%  -6.86%  (p=0.000 n=20+20)

Output sizes on Silesia and enwik9:

dickens    3220994    3179697 (× 0.987179)
enwik9   259846164  257481474 (× 0.990900)
mozilla   16912437   16895142 (× 0.998977)
mr         3502823    3473770 (× 0.991706)
nci        2306320    2300580 (× 0.997511)
ooffice    2896907    2888715 (× 0.997172)
osdb       3390548    3368411 (× 0.993471)
reymont    1657380    1639490 (× 0.989206)
samba      4329898    4315020 (× 0.996564)
sao        5416648    5383855 (× 0.993946)
webster    9972808    9887560 (× 0.991452)
xml         542277     541018 (× 0.997678)
x-ray      5733121    5681186 (× 0.990941)
total    319728325  317035918 (× 0.991579)

Wall clock time for compressing enwik9 goes up a bit, but is still close to what is was before #776.

greatroar · 2023-03-19T09:32:41Z

zstd/enc_best.go

-			// We end the search early, so we don't risk 0 literals
-			// and have to do special offset treatment.
-			startLimit := nextEmit + 1


This I don't fully understand, because the other left-extending code below did not have this. I omitted it in the new code, and all the tests pass, also with debugAsserts = true.

Repeat offsets are slightly different when there are 0 literals.

https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#repeat-offsets

There is an exception though, when current sequence's literals_length = 0. In this case, repeated offsets are shifted by one, so an offset_value of 1 means Repeated_Offset2, an offset_value of 2 means Repeated_Offset3, and an offset_value of 3 means Repeated_Offset1 - 1_byte.

This mean if we extend back so there are no literals the repeat values will be wrong or must be adjusted.

However, I believe that the s > nextEmit (and not >=) takes care of that.

You are however welcome to try to improve that.

Checking against nextEmit+1, which seemed the safest option, increases the output size by 2.6% (!). I've added an assertion instead. Please check if it's in the right place.

When fully supporting these below, we can remove the check.

klauspost · 2023-03-19T10:43:30Z

I may have checked against an older version prior to v1.15.13 by accident. Either way to big deal if we bring it back.

The SpeedBestCompression encoder now extends matches backwards before estimating their encoded size, rather than doing this after selecting the best match. This is a bit slower, but produces smaller output. Benchmarks on amd64: name old speed new speed delta Encoder_EncodeAllSimple/best-8 20.7MB/s ± 3% 19.0MB/s ± 1% -8.04% (p=0.000 n=19+18) Encoder_EncodeAllSimple4K/best-8 19.2MB/s ± 6% 17.9MB/s ± 1% -6.86% (p=0.000 n=20+20) Output sizes on Silesia and enwik9: dickens 3220994 3179697 (× 0.987179) enwik9 259846164 257481474 (× 0.990900) mozilla 16912437 16895142 (× 0.998977) mr 3502823 3473770 (× 0.991706) nci 2306320 2300580 (× 0.997511) ooffice 2896907 2888715 (× 0.997172) osdb 3390548 3368411 (× 0.993471) reymont 1657380 1639490 (× 0.989206) samba 4329898 4315020 (× 0.996564) sao 5416648 5383855 (× 0.993946) webster 9972808 9887560 (× 0.991452) xml 542277 541018 (× 0.997678) x-ray 5733121 5681186 (× 0.990941) total 319728325 317035918 (× 0.991579) Wall clock time for compressing enwik9 goes up a bit, but is still close to what is was before klauspost#776.

klauspost · 2023-03-19T12:53:51Z

Interestingly I see a regression on github-june-2days-2019.json and github-ranks-backup.bin from here: https://files.klauspost.com/compress/

We can rewrite the repeat checks to be:

		if canRepeat && best.length < goodEnough {
			if s == nextEmit {
				improve(&best, s-offset2, s, uint32(cv), 1|8)
				improve(&best, s-offset3, s, uint32(cv), 2|8)
				if offset1 > 1 {
					improve(&best, s-(offset1-1), s, uint32(cv), 3|8)
				}			}

			// If either no match or a non-repeat match, check at + 1
			if best.rep <= 0 {
				cv32 := uint32(cv >> 8)
				spp := s + 1
				improve(&best, spp-offset1, spp, cv32, 1)
				improve(&best, spp-offset2, spp, cv32, 2)
				improve(&best, spp-offset3, spp, cv32, 3)
				if best.rep < 0 {
					cv32 = uint32(cv >> 24)
					spp += 2
					improve(&best, spp-offset1, spp, cv32, 1)
					improve(&best, spp-offset2, spp, cv32, 2)
					improve(&best, spp-offset3, spp, cv32, 3)
				}
			}
		}
		
		...
		
		seq.offset = uint32(best.rep & 7)
		
		...
			switch best.rep {
			case 2, 8 | 1:
				offset1, offset2 = offset2, offset1
			case 3, 8 | 2:
				offset1, offset2, offset3 = offset3, offset1, offset2
			case 8 | 3:
				offset1, offset2, offset3 = offset1-1, offset1, offset2
			}

You can then remove the entire // Check offset 2 section and the check right above.

Another minor tweak is for index0 < s-1 { -> for index0 < s { (2 places).

You are welcome to also set const goodEnough = 250 - that gives a little as well. Speed is not really affected that much and we want pretty much the best compression we reasonably can, while not wasting CPU.

nyc-taxi-data-10M.csv likes this PR in particular.

klauspost · 2023-03-19T13:39:21Z

(sent it a bit early - fixed it above)

With these I get a good improvement across the board.

klauspost · 2023-03-19T20:27:18Z

Ran your proposal with these changes across the testset and results looks good.

Continuation of #783

greatroar · 2023-03-23T07:23:18Z

Closing in favor of #784.

greatroar commented Mar 19, 2023

View reviewed changes

greatroar force-pushed the zstd-match-backward branch from 720f5d2 to 194a8db Compare March 19, 2023 11:06

klauspost added a commit that referenced this pull request Mar 21, 2023

Add various improvements and cleanup

e283cac

Continuation of #783

greatroar closed this Mar 23, 2023

greatroar deleted the zstd-match-backward branch March 23, 2023 07:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

zstd: Improve best encoder by extending backwards #783

zstd: Improve best encoder by extending backwards #783

Uh oh!

greatroar commented Mar 19, 2023

Uh oh!

greatroar Mar 19, 2023

Uh oh!

klauspost Mar 19, 2023 •

edited

Loading

Uh oh!

greatroar Mar 19, 2023

Uh oh!

klauspost Mar 19, 2023

Uh oh!

klauspost commented Mar 19, 2023

Uh oh!

klauspost commented Mar 19, 2023 •

edited

Loading

Uh oh!

klauspost commented Mar 19, 2023 •

edited

Loading

Uh oh!

klauspost commented Mar 19, 2023 •

edited

Loading

Uh oh!

greatroar commented Mar 23, 2023

Uh oh!

Uh oh!

zstd: Improve best encoder by extending backwards #783

zstd: Improve best encoder by extending backwards #783

Uh oh!

Conversation

greatroar commented Mar 19, 2023

Uh oh!

greatroar Mar 19, 2023

Choose a reason for hiding this comment

Uh oh!

klauspost Mar 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greatroar Mar 19, 2023

Choose a reason for hiding this comment

Uh oh!

klauspost Mar 19, 2023

Choose a reason for hiding this comment

Uh oh!

klauspost commented Mar 19, 2023

Uh oh!

klauspost commented Mar 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klauspost commented Mar 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klauspost commented Mar 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greatroar commented Mar 23, 2023

Uh oh!

Uh oh!

klauspost Mar 19, 2023 •

edited

Loading

klauspost commented Mar 19, 2023 •

edited

Loading

klauspost commented Mar 19, 2023 •

edited

Loading

klauspost commented Mar 19, 2023 •

edited

Loading