Skip to content

Allow compressing hashes into a single UInt or Vector{UInt} #7

@kernelmethod

Description

@kernelmethod

Problem

The hashes returned by most hash functions tend to use a lot of memory. For instance, a length-0 Vector{Int64} (e.g. as returned by LpHash) is 40 bytes:

julia> Base.summarysize(Vector{Int64}(undef, 0))
40

Moreover, using these hashes as a key into a database or hash table is difficult since in general they may not understand the datatype being used for the key.

Proposed solution

The solution I'm proposing is to add a function compress_hash that accepts a Vector{<:Integer} or BitArray{1} and converts it into a UInt32, UInt64, or Vector{UInt8}.

  • For instance, we could use Julia's built-in hash function and simple let compress_hash(x) = hash(x), which returns UInt64.
  • Alternatively, we could reinterpret x as an Array{UInt8} and use sha256(x), which returns Vector{UInt8}.

Notes

  • It's worth considering whether or not compress_hash needs to be cryptographically secure. I suspect that it should be in order to be on the safe side for various potential applications of this package. In that case, we will need to define a type such as
struct HashCompressor
    salt :: Vector{UInt8}
end

(hashfn::HashCompressor)(x::Vector{UInt8}) = hcat(hashfn.salt, x) |> sha256
  • Adding on to the last bullet point: it may be worth looking at the new BLAKE3 as a fast alternative to sha256, though it's unlikely that we'll be hashing anything large enough to justify going to great lengths in order to do this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions