Skip to content

Make reads thread-safe by not reusing hasher by default#43

Merged
tylertreat merged 2 commits intotylertreat:masterfrom
tcard:master
Nov 13, 2025
Merged

Make reads thread-safe by not reusing hasher by default#43
tylertreat merged 2 commits intotylertreat:masterfrom
tcard:master

Conversation

@tcard
Copy link
Copy Markdown
Contributor

@tcard tcard commented Nov 4, 2025

We're seeing data race conditions when calling CountMinSketch.Count:

WARNING: DATA RACE
Read at 0x00c1515ff498 by goroutine 128:
  hash/fnv.(*sum64).Write()
      $GOROOT/src/hash/fnv/fnv.go:121 +0x36
  github.com/tylertreat/BoomFilters.hashKernel()
      /__w/vendor/github.com/tylertreat/BoomFilters/boom.go:78 +0x56
  github.com/tylertreat/BoomFilters.(*CountMinSketch).Count()
      /__w/vendor/github.com/tylertreat/BoomFilters/countmin.go:106 +0x7b

Previous write at 0x00c1515ff498 by goroutine 181:
  hash/fnv.(*sum64).Write()
      $GOROOT/src/hash/fnv/fnv.go:126 +0x98
  github.com/tylertreat/BoomFilters.hashKernel()
      /__w/vendor/github.com/tylertreat/BoomFilters/boom.go:78 +0x56
  github.com/tylertreat/BoomFilters.(*CountMinSketch).Count()
      /__w/vendor/github.com/tylertreat/BoomFilters/countmin.go:106 +0x7b

The race is on the *sum64 receiver of the hash/fnv.Write method, and happens because the library stores the *sum64, wrapped in a hash.Hash64, instead of calling fnv.New64 (and fnv.New32) afresh every time it needs to hash something. Concurrent hash operations will thus read-write from the same *sum64. This isn't thread-safe: hash operations will step on each other.

To avoid this, initially I changed the code to call fnv.New64 for each hash operation. This, however, has a noticeable impact on performance, as a heap allocation needs to happen each time.

But really, this allocation doesn't need to exist to begin with: sum64 is just a uint64, and it can be stack-allocated and passed around as one would with any word-sized value. It's really an unfortunate consequence of the design of hash that this isn't available out of the box, and we're forced to go through interfaces just to move uint64s around.

So instead, the (small) bits of code needed to implement FNV have been copied into plain, interface-less functions, and called by default, unless a custom hash.Hash32/64 is provided.

Benchmarks:

With direct calls (pull request)
goos: darwin
goarch: arm64
pkg: github.com/tylertreat/BoomFilters
cpu: Apple M4
                              │  base.bench   │               new.bench               │
                              │    sec/op     │    sec/op      vs base                │
HashKernel-10                    3.208n ±  0%   2.479n ±   1%  -22.72% (p=0.000 n=10)
BucketsIncrement-10              7.290n ±  0%   7.293n ±   0%        ~ (p=0.107 n=10)
BucketsSet-10                    4.245n ±  0%   4.246n ±   1%        ~ (p=0.697 n=10)
BucketsGet-10                    3.458n ±  0%   3.468n ±   8%   +0.29% (p=0.009 n=10)
BloomAdd-10                      16.79n ±  1%   16.08n ±   0%   -4.26% (p=0.000 n=10)
BloomTest-10                     6.691n ±  0%   6.272n ±   0%   -6.26% (p=0.000 n=10)
BloomTestAndAdd-10               21.03n ±  0%   21.11n ±   0%   +0.43% (p=0.000 n=10)
CountingAdd-10                   23.14n ±  1%   22.83n ±   1%   -1.32% (p=0.002 n=10)
CountingTest-10                  6.729n ±  2%   6.312n ±   0%   -6.19% (p=0.000 n=10)
CountingTestAndAdd-10            27.11n ±  0%   26.23n ±   0%   -3.23% (p=0.000 n=10)
CountingTestAndRemove-10         13.78n ±  0%   13.38n ±   0%   -2.94% (p=0.000 n=10)
CMSWriteDataTo-10                7.568µ ±  1%   7.598µ ±  12%        ~ (p=0.165 n=10)
CMSReadDataFrom-10               4.487µ ± 66%   4.624µ ±   2%        ~ (p=0.184 n=10)
CMSAdd-10                        6.301n ±  0%   5.571n ±   0%  -11.59% (p=0.000 n=10)
CMSCount-10                      7.843n ±  1%   7.341n ±   0%   -6.40% (p=0.000 n=10)
CMSReset-10                      30.42µ ±  0%   30.39µ ±   0%        ~ (p=0.342 n=10)
CuckooAdd-10                     320.2n ±  1%   337.5n ±   3%   +5.40% (p=0.000 n=10)
CuckooTest-10                    300.2n ± 71%   310.9n ±  72%        ~ (p=0.353 n=10)
CuckooTestAndAdd-10              91.58n ±  2%   91.46n ±   9%        ~ (p=0.953 n=10)
CuckooTestAndRemove-10           91.09n ±  8%   98.22n ± 454%   +7.82% (p=0.023 n=10)
DeletableAdd-10                  22.68n ±  0%   22.78n ±   0%   +0.42% (p=0.003 n=10)
DeletableTest-10                 6.426n ±  1%   5.904n ±   1%   -8.12% (p=0.000 n=10)
DeletableTestAndAdd-10           31.33n ±  0%   29.58n ±   0%   -5.57% (p=0.000 n=10)
DeletableTestAndRemove-10        13.54n ±  0%   12.90n ±   0%   -4.73% (p=0.000 n=10)
HllWriteDataTo-10                106.0n ±  0%   108.3n ±   1%   +2.22% (p=0.001 n=10)
HllReadDataFrom-10               62.36n ±  2%   62.93n ±   1%        ~ (p=0.306 n=10)
HLLCount4-10                     137.3n ±  1%   137.0n ±   1%        ~ (p=0.170 n=10)
HLLCount5-10                     289.9n ±  3%   286.7n ±   3%        ~ (p=0.280 n=10)
HLLCount6-10                     568.8n ±  1%   563.5n ±   1%   -0.93% (p=0.000 n=10)
HLLCount7-10                     1.170µ ±  2%   1.134µ ±   1%   -3.12% (p=0.000 n=10)
HLLCount8-10                     2.435µ ±  3%   2.322µ ±   0%   -4.66% (p=0.000 n=10)
HLLCount9-10                     5.236µ ±  9%   4.879µ ±   3%   -6.81% (p=0.020 n=10)
HLLCount10-10                    10.29µ ±  7%   10.42µ ±   5%        ~ (p=0.912 n=10)
InverseAdd-10                    30.03n ±  2%   22.74n ±   3%  -24.26% (p=0.000 n=10)
InverseTest-10                  12.435n ±  0%   6.550n ±   1%  -47.32% (p=0.000 n=10)
InverseTestAndAdd-10             40.35n ±  1%   31.15n ±   1%  -22.81% (p=0.000 n=10)
MinHash-10                       86.95m ±  1%   87.68m ±   0%   +0.84% (p=0.005 n=10)
PartitionedBloomAdd-10           16.20n ±  0%   16.60n ±   0%   +2.50% (p=0.000 n=10)
PartitionedBloomTest-10          6.704n ±  3%   6.194n ±   1%   -7.61% (p=0.000 n=10)
PartitionedBloomTestAndAdd-10    21.53n ±  0%   20.88n ±   0%   -2.97% (p=0.000 n=10)
ScalableBloomAdd-10              183.6n ±  1%   185.2n ±   0%   +0.87% (p=0.001 n=10)
ScalableBloomTest-10             7.498n ±  0%   6.671n ±   0%  -11.03% (p=0.000 n=10)
ScalableBloomTestAndAdd-10       890.4n ±  3%   898.5n ±   0%        ~ (p=1.000 n=10)
StableAdd-10                     56.59n ±  0%   56.11n ±   0%   -0.87% (p=0.000 n=10)
StableTest-10                    6.579n ±  0%   6.263n ±   0%   -4.80% (p=0.000 n=10)
StableTestAndAdd-10              72.04n ±  0%   72.56n ±   0%   +0.73% (p=0.002 n=10)
UnstableAdd-10                   24.05n ±  0%   22.57n ±   0%   -6.19% (p=0.000 n=10)
UnstableTest-10                  6.550n ±  1%   6.067n ±   1%   -7.36% (p=0.000 n=10)
UnstableTestAndAdd-10            28.77n ±  0%   28.14n ±   0%   -2.17% (p=0.000 n=10)
TopKAdd-10                       14.43n ±  0%   13.66n ±   0%   -5.37% (p=0.000 n=10)
geomean                          76.76n         73.12n          -4.74%

                   │  base.bench  │               new.bench               │
                   │     B/op     │     B/op      vs base                 │
CMSWriteDataTo-10    123.4Ki ± 1%   123.2Ki ± 1%       ~ (p=0.631 n=10)
CMSReadDataFrom-10   21.27Ki ± 0%   21.27Ki ± 0%       ~ (p=1.000 n=10) ¹
HllWriteDataTo-10      828.5 ± 1%     839.5 ± 1%  +1.33% (p=0.000 n=10)
HllReadDataFrom-10     152.0 ± 0%     152.0 ± 0%       ~ (p=1.000 n=10) ¹
geomean              4.214Ki        4.226Ki       +0.28%
¹ all samples are equal

                   │ base.bench │              new.bench              │
                   │ allocs/op  │ allocs/op   vs base                 │
CMSWriteDataTo-10    7.000 ± 0%   7.000 ± 0%       ~ (p=1.000 n=10) ¹
CMSReadDataFrom-10   4.000 ± 0%   4.000 ± 0%       ~ (p=1.000 n=10) ¹
HllWriteDataTo-10    6.000 ± 0%   6.000 ± 0%       ~ (p=1.000 n=10) ¹
HllReadDataFrom-10   4.000 ± 0%   4.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean              5.091        5.091       +0.00%
¹ all samples are equal
With indirect calls through new hasher per operation (discarded option)
goos: darwin
goarch: arm64
pkg: github.com/tylertreat/BoomFilters
cpu: Apple M4
                              │  base.bench   │              iface.bench               │
                              │    sec/op     │    sec/op      vs base                 │
HashKernel-10                    3.208n ±  0%    9.258n ±  2%  +188.59% (p=0.000 n=10)
BucketsIncrement-10              7.290n ±  0%    7.281n ±  0%         ~ (p=0.224 n=10)
BucketsSet-10                    4.245n ±  0%    4.236n ±  0%    -0.21% (p=0.008 n=10)
BucketsGet-10                    3.458n ±  0%    3.449n ±  0%    -0.26% (p=0.000 n=10)
BloomAdd-10                      16.79n ±  1%    21.85n ±  1%   +30.10% (p=0.000 n=10)
BloomTest-10                     6.691n ±  0%   12.125n ±  2%   +81.23% (p=0.000 n=10)
BloomTestAndAdd-10               21.03n ±  0%    27.33n ±  1%   +29.99% (p=0.000 n=10)
CountingAdd-10                   23.14n ±  1%    27.51n ±  3%   +18.91% (p=0.000 n=10)
CountingTest-10                  6.729n ±  2%   12.340n ±  2%   +83.39% (p=0.000 n=10)
CountingTestAndAdd-10            27.11n ±  0%    31.55n ±  2%   +16.38% (p=0.000 n=10)
CountingTestAndRemove-10         13.78n ±  0%    18.83n ±  2%   +36.61% (p=0.000 n=10)
CMSWriteDataTo-10                7.568µ ±  1%    7.443µ ± 24%         ~ (p=0.393 n=10)
CMSReadDataFrom-10               4.487µ ± 66%    6.984µ ± 31%   +55.66% (p=0.050 n=10)
CMSAdd-10                        6.301n ±  0%   12.350n ±  3%   +96.02% (p=0.000 n=10)
CMSCount-10                      7.843n ±  1%   15.500n ±  1%   +97.63% (p=0.000 n=10)
CMSReset-10                      30.42µ ±  0%    30.35µ ±  0%    -0.23% (p=0.016 n=10)
CuckooAdd-10                     320.2n ±  1%    398.7n ±  5%   +24.50% (p=0.000 n=10)
CuckooTest-10                   300.15n ± 71%    94.58n ±  5%   -68.49% (p=0.009 n=10)
CuckooTestAndAdd-10              91.58n ±  2%    97.19n ±  1%    +6.13% (p=0.000 n=10)
CuckooTestAndRemove-10           91.09n ±  8%    95.62n ±  2%    +4.97% (p=0.019 n=10)
DeletableAdd-10                  22.68n ±  0%    26.70n ±  1%   +17.72% (p=0.000 n=10)
DeletableTest-10                 6.426n ±  1%   12.165n ±  2%   +89.29% (p=0.000 n=10)
DeletableTestAndAdd-10           31.33n ±  0%    35.15n ±  1%   +12.21% (p=0.000 n=10)
DeletableTestAndRemove-10        13.54n ±  0%    18.16n ±  3%   +34.08% (p=0.000 n=10)
HllWriteDataTo-10                106.0n ±  0%    104.2n ±  1%    -1.70% (p=0.001 n=10)
HllReadDataFrom-10               62.36n ±  2%    65.50n ±  6%    +5.04% (p=0.042 n=10)
HLLCount4-10                     137.3n ±  1%    126.0n ±  1%    -8.23% (p=0.000 n=10)
HLLCount5-10                     289.9n ±  3%    266.2n ±  2%    -8.17% (p=0.000 n=10)
HLLCount6-10                     568.8n ±  1%    536.1n ±  6%    -5.74% (p=0.000 n=10)
HLLCount7-10                     1.170µ ±  2%    1.049µ ±  5%   -10.34% (p=0.000 n=10)
HLLCount8-10                     2.435µ ±  3%    2.239µ ±  2%    -8.05% (p=0.000 n=10)
HLLCount9-10                     5.236µ ±  9%    4.425µ ±  2%   -15.48% (p=0.000 n=10)
HLLCount10-10                   10.289µ ±  7%    9.247µ ±  9%   -10.12% (p=0.002 n=10)
InverseAdd-10                    30.03n ±  2%    29.20n ±  1%    -2.76% (p=0.000 n=10)
InverseTest-10                   12.44n ±  0%    14.08n ±  3%   +13.23% (p=0.000 n=10)
InverseTestAndAdd-10             40.35n ±  1%    37.75n ±  1%    -6.44% (p=0.000 n=10)
MinHash-10                       86.95m ±  1%    80.53m ±  2%    -7.38% (p=0.000 n=10)
PartitionedBloomAdd-10           16.20n ±  0%    21.95n ±  1%   +35.50% (p=0.000 n=10)
PartitionedBloomTest-10          6.704n ±  3%   12.035n ±  3%   +79.53% (p=0.000 n=10)
PartitionedBloomTestAndAdd-10    21.53n ±  0%    26.95n ±  2%   +25.23% (p=0.000 n=10)
ScalableBloomAdd-10              183.6n ±  1%    155.3n ±  1%   -15.39% (p=0.000 n=10)
ScalableBloomTest-10             7.498n ±  0%   12.730n ±  3%   +69.78% (p=0.000 n=10)
ScalableBloomTestAndAdd-10       890.4n ±  3%    855.4n ±  2%    -3.93% (p=0.000 n=10)
StableAdd-10                     56.59n ±  0%    63.02n ±  3%   +11.34% (p=0.000 n=10)
StableTest-10                    6.579n ±  0%   12.180n ±  1%   +85.13% (p=0.000 n=10)
StableTestAndAdd-10              72.04n ±  0%    81.65n ±  1%   +13.35% (p=0.000 n=10)
UnstableAdd-10                   24.05n ±  0%    27.86n ±  3%   +15.80% (p=0.000 n=10)
UnstableTest-10                  6.550n ±  1%   12.340n ±  3%   +88.41% (p=0.000 n=10)
UnstableTestAndAdd-10            28.77n ±  0%    34.08n ±  1%   +18.46% (p=0.000 n=10)
TopKAdd-10                       14.43n ±  0%    31.82n ±  1%  +120.55% (p=0.000 n=10)
geomean                          76.76n          91.69n         +19.46%

                   │  base.bench  │              iface.bench              │
                   │     B/op     │     B/op      vs base                 │
CMSWriteDataTo-10    123.4Ki ± 1%   121.0Ki ± 1%  -1.95% (p=0.002 n=10)
CMSReadDataFrom-10   21.27Ki ± 0%   21.27Ki ± 0%       ~ (p=1.000 n=10) ¹
HllWriteDataTo-10      828.5 ± 1%     819.0 ± 1%  -1.15% (p=0.005 n=10)
HllReadDataFrom-10     152.0 ± 0%     152.0 ± 0%       ~ (p=1.000 n=10) ¹
geomean              4.214Ki        4.181Ki       -0.78%
¹ all samples are equal

                   │ base.bench │             iface.bench             │
                   │ allocs/op  │ allocs/op   vs base                 │
CMSWriteDataTo-10    7.000 ± 0%   7.000 ± 0%       ~ (p=1.000 n=10) ¹
CMSReadDataFrom-10   4.000 ± 0%   4.000 ± 0%       ~ (p=1.000 n=10) ¹
HllWriteDataTo-10    6.000 ± 0%   6.000 ± 0%       ~ (p=1.000 n=10) ¹
HllReadDataFrom-10   4.000 ± 0%   4.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean              5.091        5.091       +0.00%
¹ all samples are equal

Copy link
Copy Markdown
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately we can have a sync pool with hashers instead of allocating one on each call. Will that simplify the code?

Comment thread fnv.go Outdated
h.Reset()
return sum
}
fnv.New64()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a stray call?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, no idea how that ended up there. Fixed at 1966bbf.

@tcard
Copy link
Copy Markdown
Contributor Author

tcard commented Nov 5, 2025

I added some benchmarks to the pull request description.

Alternately we can have a sync pool with hashers instead of allocating one on each call. Will that simplify the code?

@dimitarvdimitrov In this PR we're not allocating anything, just a local (stack) uint64 to hold the hash. A pool is pure overhead in this case for sharing objects that shouldn't exist to begin with.

I think avoiding this overhead is worth the extra code of copying the FNV functions over.

@tylertreat tylertreat merged commit 2976346 into tylertreat:master Nov 13, 2025
@tylertreat
Copy link
Copy Markdown
Owner

Thanks!

tcard added a commit to grafana/mimir that referenced this pull request Nov 18, 2025
<!--  Thanks for sending a pull request!  Before submitting:

1. Read our CONTRIBUTING.md guide
2. Rebase your PR if it gets out of sync with main
-->

#### What this PR does

Upgrades BoomFilters to apply this fix:

* tylertreat/BoomFilters#43

#### Which issue(s) this PR fixes or relates to

—

#### Checklist

- [ ] Tests updated.
- [ ] Documentation added.
- [ ] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Bumps `github.com/tylertreat/BoomFilters` and updates vendored code to
use internal FNV helper hashing across filters, removing legacy FNV
usages and minor vendor files.
> 
> - **Dependencies**:
> - Upgrade `github.com/tylertreat/BoomFilters` to
`v0.0.0-20251117164519-53813c36cc1b` in `go.mod`/`go.sum` and
`vendor/modules.txt`.
> - **Vendored library (`vendor/github.com/tylertreat/BoomFilters`)**:
> - Add `fnv.go` with internal FNV helper functions and refactor hashing
to use these in `boom.go`, `cuckoo.go`, `hyperloglog.go`, `inverse.go`,
`partitioned.go`, `stable.go`, `countmin.go`, `classic.go`,
`counting.go`.
> - Adjust constructors/decoders to stop defaulting to `fnv` in several
filters and use new helpers; switch `InverseBloomFilter` indexing to
optional pooled hash or default helper.
>   - Remove `.travis.yml`; simplify `README.md` badges.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
33bc6a1. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants