Skip to content

zstd: Improve best encoder by extending backwards #783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

greatroar
Copy link
Contributor

In #776, the compression quality of the zstd SpeedBestCompression encoder decreased, although you may have misunderstood that. In any case, here's a patch that improves it quite a bit, at the expense of some of the speed gain from #776. The final output sizes are smaller than they were prior to that PR.

The encoder now extends matches backwards before estimating their encoded size, rather than doing this after selecting the best match based on the partial matches gotten from only looking forward from the initial match offset.

Benchmarks on amd64:

name                              old speed      new speed      delta
Encoder_EncodeAllSimple/best-8    20.7MB/s ± 3%  19.0MB/s ± 1%  -8.04%  (p=0.000 n=19+18)
Encoder_EncodeAllSimple4K/best-8  19.2MB/s ± 6%  17.9MB/s ± 1%  -6.86%  (p=0.000 n=20+20)

Output sizes on Silesia and enwik9:

dickens    3220994    3179697 (× 0.987179)
enwik9   259846164  257481474 (× 0.990900)
mozilla   16912437   16895142 (× 0.998977)
mr         3502823    3473770 (× 0.991706)
nci        2306320    2300580 (× 0.997511)
ooffice    2896907    2888715 (× 0.997172)
osdb       3390548    3368411 (× 0.993471)
reymont    1657380    1639490 (× 0.989206)
samba      4329898    4315020 (× 0.996564)
sao        5416648    5383855 (× 0.993946)
webster    9972808    9887560 (× 0.991452)
xml         542277     541018 (× 0.997678)
x-ray      5733121    5681186 (× 0.990941)
total    319728325  317035918 (× 0.991579)

Wall clock time for compressing enwik9 goes up a bit, but is still close to what is was before #776.

Comment on lines -302 to -304
// We end the search early, so we don't risk 0 literals
// and have to do special offset treatment.
startLimit := nextEmit + 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I don't fully understand, because the other left-extending code below did not have this. I omitted it in the new code, and all the tests pass, also with debugAsserts = true.

Copy link
Owner

@klauspost klauspost Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeat offsets are slightly different when there are 0 literals.

https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#repeat-offsets

There is an exception though, when current sequence's literals_length = 0. In this case, repeated offsets are shifted by one, so an offset_value of 1 means Repeated_Offset2, an offset_value of 2 means Repeated_Offset3, and an offset_value of 3 means Repeated_Offset1 - 1_byte.

This mean if we extend back so there are no literals the repeat values will be wrong or must be adjusted.

However, I believe that the s > nextEmit (and not >=) takes care of that.

You are however welcome to try to improve that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking against nextEmit+1, which seemed the safest option, increases the output size by 2.6% (!). I've added an assertion instead. Please check if it's in the right place.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When fully supporting these below, we can remove the check.

@klauspost
Copy link
Owner

I may have checked against an older version prior to v1.15.13 by accident. Either way to big deal if we bring it back.

The SpeedBestCompression encoder now extends matches backwards before
estimating their encoded size, rather than doing this after selecting
the best match. This is a bit slower, but produces smaller output.

Benchmarks on amd64:

name                              old speed      new speed      delta
Encoder_EncodeAllSimple/best-8    20.7MB/s ± 3%  19.0MB/s ± 1%  -8.04%  (p=0.000 n=19+18)
Encoder_EncodeAllSimple4K/best-8  19.2MB/s ± 6%  17.9MB/s ± 1%  -6.86%  (p=0.000 n=20+20)

Output sizes on Silesia and enwik9:

dickens    3220994    3179697 (× 0.987179)
enwik9   259846164  257481474 (× 0.990900)
mozilla   16912437   16895142 (× 0.998977)
mr         3502823    3473770 (× 0.991706)
nci        2306320    2300580 (× 0.997511)
ooffice    2896907    2888715 (× 0.997172)
osdb       3390548    3368411 (× 0.993471)
reymont    1657380    1639490 (× 0.989206)
samba      4329898    4315020 (× 0.996564)
sao        5416648    5383855 (× 0.993946)
webster    9972808    9887560 (× 0.991452)
xml         542277     541018 (× 0.997678)
x-ray      5733121    5681186 (× 0.990941)
total    319728325  317035918 (× 0.991579)

Wall clock time for compressing enwik9 goes up a bit, but is still close
to what is was before klauspost#776.
@greatroar greatroar force-pushed the zstd-match-backward branch from 720f5d2 to 194a8db Compare March 19, 2023 11:06
@klauspost
Copy link
Owner

klauspost commented Mar 19, 2023

Interestingly I see a regression on github-june-2days-2019.json and github-ranks-backup.bin from here: https://files.klauspost.com/compress/

We can rewrite the repeat checks to be:

		if canRepeat && best.length < goodEnough {
			if s == nextEmit {
				improve(&best, s-offset2, s, uint32(cv), 1|8)
				improve(&best, s-offset3, s, uint32(cv), 2|8)
				if offset1 > 1 {
					improve(&best, s-(offset1-1), s, uint32(cv), 3|8)
				}			}

			// If either no match or a non-repeat match, check at + 1
			if best.rep <= 0 {
				cv32 := uint32(cv >> 8)
				spp := s + 1
				improve(&best, spp-offset1, spp, cv32, 1)
				improve(&best, spp-offset2, spp, cv32, 2)
				improve(&best, spp-offset3, spp, cv32, 3)
				if best.rep < 0 {
					cv32 = uint32(cv >> 24)
					spp += 2
					improve(&best, spp-offset1, spp, cv32, 1)
					improve(&best, spp-offset2, spp, cv32, 2)
					improve(&best, spp-offset3, spp, cv32, 3)
				}
			}
		}
		
		...
		
		seq.offset = uint32(best.rep & 7)
		
		...
			switch best.rep {
			case 2, 8 | 1:
				offset1, offset2 = offset2, offset1
			case 3, 8 | 2:
				offset1, offset2, offset3 = offset3, offset1, offset2
			case 8 | 3:
				offset1, offset2, offset3 = offset1-1, offset1, offset2
			}
		

You can then remove the entire // Check offset 2 section and the check right above.

Another minor tweak is for index0 < s-1 { -> for index0 < s { (2 places).

You are welcome to also set const goodEnough = 250 - that gives a little as well. Speed is not really affected that much and we want pretty much the best compression we reasonably can, while not wasting CPU.

nyc-taxi-data-10M.csv likes this PR in particular.

@klauspost
Copy link
Owner

klauspost commented Mar 19, 2023

(sent it a bit early - fixed it above)

With these I get a good improvement across the board.

@klauspost
Copy link
Owner

klauspost commented Mar 19, 2023

Ran your proposal with these changes across the testset and results looks good.

klauspost added a commit that referenced this pull request Mar 21, 2023
@greatroar
Copy link
Contributor Author

Closing in favor of #784.

@greatroar greatroar closed this Mar 23, 2023
@greatroar greatroar deleted the zstd-match-backward branch March 23, 2023 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants