-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
The hashes returned by most hash functions tend to use a lot of memory. For instance, a length-0 Vector{Int64}
(e.g. as returned by LpHash
) is 40 bytes:
julia> Base.summarysize(Vector{Int64}(undef, 0))
40
Moreover, using these hashes as a key into a database or hash table is difficult since in general they may not understand the datatype being used for the key.
Proposed solution
The solution I'm proposing is to add a function compress_hash
that accepts a Vector{<:Integer}
or BitArray{1}
and converts it into a UInt32
, UInt64
, or Vector{UInt8}
.
- For instance, we could use Julia's built-in
hash
function and simple letcompress_hash(x) = hash(x)
, which returnsUInt64
. - Alternatively, we could reinterpret
x
as anArray{UInt8}
and usesha256(x)
, which returnsVector{UInt8}
.
Notes
- It's worth considering whether or not
compress_hash
needs to be cryptographically secure. I suspect that it should be in order to be on the safe side for various potential applications of this package. In that case, we will need to define a type such as
struct HashCompressor
salt :: Vector{UInt8}
end
(hashfn::HashCompressor)(x::Vector{UInt8}) = hcat(hashfn.salt, x) |> sha256
- Adding on to the last bullet point: it may be worth looking at the new BLAKE3 as a fast alternative to
sha256
, though it's unlikely that we'll be hashing anything large enough to justify going to great lengths in order to do this.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request