Skip to content

Proposal: Parallel Uploading with Parallel Hashing #574

@shizhMSFT

Description

@shizhMSFT

Introduction

In the era of AI, registries are no longer limited to storing images and related artifacts. They are increasingly being used to store AI models as artifacts as well. For example, Ollama (registry.ollama.ai, although not OCI-compliant) and Docker (docker.io a.k.a. Docker Hub) both distribute AI models through their respective registries.

Note

By running oras manifest fetch docker.io/ai/phi4:latest --pretty, we can obtain a manifest of the Microsoft Phi-4 14B AI model distributed by Docker Hub.

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.docker.ai.model.config.v0.1+json",
    "size": 371,
    "digest": "sha256:7f4e11b43206e29e2dad7a47ac3c17a2a22c5576c5d9e1a2c0da13bd0fcfb8f4"
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.ai.gguf.v3",
      "size": 9053114560,
      "digest": "sha256:4c20db82e7abcbd7e07b6f8a393fe82ced1b4c6e15dd87ed1122872331854c1c"
    },
    {
      "mediaType": "application/vnd.docker.ai.license",
      "size": 1105,
      "digest": "sha256:c49419617a6070bcb197cfe272f7007fdec3e790dbb529cb995473bd69c0bd51"
    }
  ]
}

As you can observe, the layer 0 is a GGUF format AI model of size 8.4 GiB.

Since the AI models are typically large, it is time consuming and even unstable (due to networking) to be uploaded to the registry. In fact, not just AI models, all large artifacts like VM images have this challenging issue.

To address this issue, the community has explored several approaches. In issue #546, @rchincha proposed support for out-of-order chunked uploading. However, this method may be ineffective because the registry server still needs to read the chunks sequentially to compute the digest, due to the sequential hashing nature of algorithms like SHA-256. Later, in issue #573, @syed introduced the concept of the chunk-index. This approach allows clients to upload chunks in parallel as individual blobs and then commit a chunk index that can be referenced in the manifest. While this method offers clear advantages in terms of upload flexibility and parallelism, it also introduces trade-offs. Specifically, it shifts the responsibility of reassembling the data to the client and increases the burden on the server's garbage collection system due to the additional blob management overhead.

In this proposal, I introduce an alternative approach for the community to evaluate by presenting the ParallelHash function.

Issue Analysis

As @syed noted in issue #573, supporting out-of-order chunked uploads is not a significant challenge, since most registry operators use object storage as their backend. These storage systems typically support large file uploads and allow blocks or parts to be uploaded in parallel.

Tip

Here are some cloud providers which support such large file uploads feature:

The fundamental challenge lies in the limitations of current cryptographic hash algorithms. As @sudo-bmitch pointed out in issue #573, there is a need for a hashing algorithm that supports concurrency. Such an algorithm should be capable of processing chunks out of order and computing the final digest without requiring all data to be received sequentially or waiting for the last chunk. @sudo-bmitch also suggested that BLAKE3 could be a potential candidate for this purpose.

Proposal

The proposal is to introduce a new hash algorithm ParallelHash for hashing out-of-order chunks in parallel.

The ParalellHash function is a SHA-3 Derived Function specified by NIST SP 800-185 based on cSHAKE, which is a customizable variant of the SHAKE functions defined in FIPS 202. It takes block size, data in blocks, output length, and an optional customization string as input to compute the hash digests. Meanwhile, ParallelHash supports two security strengths: ParallelHash128 for 128-bit security, and ParallelHash256 for 256-bit security.

Taking ParallelHash256 as example, Here's its algorithm. Given

  • X: data to be hashed
  • B: block size in bytes such that $0 \lt B \lt 2^{2040}$
  • L: requested hash digest in bits
  • S: optional customization string

To compute ParallelHash256(X, B, L, S),

  1. Break the data X into n blocks such that $X = X_1 \Vert X_2 \Vert \cdots \Vert X_n$.
  2. For each block $X_i$, compute its SHAKE256 hash $Z_i = SHAKE256(X_i, 512)$. The length of $Z_i$ is 512 bits (i.e. 64 bytes).
  3. Construct $Z = \mathbf{leftEncode}(B) \Vert Z_1 \Vert Z_2 \Vert \cdots \Vert Z_n \Vert \mathbf{rightEncode}(n) \Vert \mathbf{rightEncode}(L)$.
  4. Return $\mathbf{cSHAKE}(Z, L, \textbf{"ParallelHash"}, S)$ as the final hash.

Note

I have implemented ParallelHash in Go with code and documentation (see also multi-threading hash example) available using the SHAKE and cSHAKE functions from the golang built-in crypto/sha3 package.

Since ParallelHash takes B (block size) and L (hash output length), considering the blob deduplication feature of registries, it is better that B and L are fixed.

  • For B, we can choose 4 MB, 256 MB or other values (we can decide later).
  • For L, we can choose 256 for ParallelHash128 and 512 for ParallelHash256.

It is worth noting that each data block has a standard SHAKE hash instead of some internal states. Therefore, the following large blob upload scenario is made possible.

  1. POST /v2/<name>/blobs/uploads/ to obtain an UUID for blob uploading
  2. Break the blob into chunks of block size, and upload the chunks in parallel in an out-of-order fashion: PATCH /v2/<name>/blobs/uploads/<uuid>?digest=<shake_digest>.
    • In the future, the chunk size can be the multiple of the block size, but I need to think about how to transfer multiple digests to a PATCH request or equivalent alternatives.
    • If ?digest=<shake_digest> is not provided, the chunks MUST be uploaded in sequential order and follow today's distribution spec. That is, we have backward compatibility if the registry accepts ParallelHash as digest algorithm but does not support out-of-order chunk uploads.
  3. POST /v2/<name>/blobs/uploads/<uuid>?digest=<parallelhash_digest> to finalize the blob upload with a list of SHAKE digests for the uploaded chunks in the request body.
    • If the block size is 4 MiB, a 40 GiB blob will require $10,240 \times 64 = 655,360$ bytes (i.e. 640 KiB) of hash metadata. Even those metadata are base64 encoded, they are still less than 1 MiB.

Additionally, it opens a possibility that clients can upload the blocks as standard blobs to the registry if the registry accepts SHAKE algorithms for digests so that those blocks can be directly referenced by a manifest or a chunked-index blob. Later, the clients can ask the registry to assemble those blocks to a larger blob (see also Put Block From URL (Azure Storage) and UploadPartCopy (Amazon S3)).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions