Proposal: Parallel Uploading with Parallel Hashing

## Introduction

In the era of AI, registries are no longer limited to storing images and related artifacts. They are increasingly being used to store AI models as artifacts as well. For example, Ollama ([registry.ollama.ai](https://registry.ollama.ai/search), although not OCI-compliant) and Docker ([docker.io](https://hub.docker.com/catalogs/models) a.k.a. Docker Hub) both distribute AI models through their respective registries.

> [!NOTE]
> By running `oras manifest fetch docker.io/ai/phi4:latest --pretty`, we can obtain a manifest of the Microsoft Phi-4 14B AI model distributed by [Docker Hub](https://hub.docker.com/r/ai/phi4).
> ```json
> {
>   "schemaVersion": 2,
>   "mediaType": "application/vnd.oci.image.manifest.v1+json",
>   "config": {
>     "mediaType": "application/vnd.docker.ai.model.config.v0.1+json",
>     "size": 371,
>     "digest": "sha256:7f4e11b43206e29e2dad7a47ac3c17a2a22c5576c5d9e1a2c0da13bd0fcfb8f4"
>   },
>   "layers": [
>     {
>       "mediaType": "application/vnd.docker.ai.gguf.v3",
>       "size": 9053114560,
>       "digest": "sha256:4c20db82e7abcbd7e07b6f8a393fe82ced1b4c6e15dd87ed1122872331854c1c"
>     },
>     {
>       "mediaType": "application/vnd.docker.ai.license",
>       "size": 1105,
>       "digest": "sha256:c49419617a6070bcb197cfe272f7007fdec3e790dbb529cb995473bd69c0bd51"
>     }
>   ]
> }
> ```
> As you can observe, the layer `0` is a GGUF format AI model of size `8.4` GiB.

Since the AI models are typically large, it is time consuming and even unstable (due to networking) to be uploaded to the registry. In fact, not just AI models, all large artifacts like VM images have this challenging issue.

To address this issue, the community has explored several approaches. In issue #546, @rchincha proposed support for out-of-order chunked uploading. However, this method may be ineffective because the registry server still needs to read the chunks sequentially to compute the digest, due to the sequential hashing nature of algorithms like SHA-256. Later, in issue #573, @syed introduced the concept of the chunk-index. This approach allows clients to upload chunks in parallel as individual blobs and then commit a chunk index that can be referenced in the manifest. While this method offers clear advantages in terms of upload flexibility and parallelism, it also introduces trade-offs. Specifically, it shifts the responsibility of reassembling the data to the client and increases the burden on the server's garbage collection system due to the additional blob management overhead.

In this proposal, I introduce an alternative approach for the community to evaluate by presenting the `ParallelHash` function.

## Issue Analysis

As @syed noted in issue #573, supporting out-of-order chunked uploads is not a significant challenge, since most registry operators use object storage as their backend. These storage systems typically support large file uploads and allow blocks or parts to be uploaded in parallel.

> [!TIP]
> Here are some cloud providers which support such large file uploads feature:
> - [Block blobs in Azure Storage](https://learn.microsoft.com/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs)
> - [Multipart upload in Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html)
> - [XML API multipart uploads in GCP](https://cloud.google.com/storage/docs/multipart-uploads)
> - [Multipart upload in Alibaba Cloud](https://www.alibabacloud.com/help/en/oss/user-guide/multipart-upload)

The fundamental challenge lies in the limitations of current cryptographic hash algorithms. As @sudo-bmitch pointed out in issue #573, there is a need for a hashing algorithm that supports concurrency. Such an algorithm should be capable of processing chunks out of order and computing the final digest without requiring all data to be received sequentially or waiting for the last chunk. @sudo-bmitch also suggested that BLAKE3 could be a potential candidate for this purpose.

## Proposal

The proposal is to introduce a new hash algorithm `ParallelHash` for hashing out-of-order chunks in parallel.

The `ParalellHash` function is a SHA-3 Derived Function specified by [NIST SP 800-185](https://csrc.nist.gov/pubs/sp/800/185/final) based on `cSHAKE`, which is a customizable variant of the `SHAKE` functions defined in [FIPS 202](https://csrc.nist.gov/pubs/fips/202/final). It takes block size, data in blocks, output length, and an optional customization string as input to compute the hash digests. Meanwhile, `ParallelHash` supports two security strengths: `ParallelHash128` for 128-bit security, and `ParallelHash256` for 256-bit security.

Taking `ParallelHash256` as example, Here's its algorithm. Given
- `X`: data to be hashed
- `B`: block size in **bytes** such that $0 \lt B \lt 2^{2040}$
- `L`: requested hash digest in **bits**
- `S`: optional customization string

To compute `ParallelHash256(X, B, L, S)`, 
1. Break the data `X` into `n` blocks such that $X = X_1 \Vert X_2 \Vert \cdots \Vert X_n$.
2. For each block $X_i$, compute its `SHAKE256` hash $Z_i = SHAKE256(X_i, 512)$. The length of $Z_i$ is `512` bits (i.e. `64` bytes).
3. Construct $Z = \mathbf{leftEncode}(B) \Vert Z_1 \Vert Z_2 \Vert \cdots \Vert Z_n \Vert \mathbf{rightEncode}(n) \Vert \mathbf{rightEncode}(L)$.
4. Return $\mathbf{cSHAKE}(Z, L, \textbf{"ParallelHash"}, S)$ as the final hash.

> [!NOTE]
> I have implemented `ParallelHash` in Go with [code](https://github.com/shizhMSFT/sha3df) and [documentation](https://pkg.go.dev/github.com/shizhMSFT/sha3df) (see also multi-threading hash [example](https://pkg.go.dev/github.com/shizhMSFT/sha3df#example-ParallelHash)) available using the `SHAKE` and `cSHAKE` functions from the golang built-in [crypto/sha3](https://pkg.go.dev/crypto/sha3) package.

Since `ParallelHash` takes `B` (block size) and `L` (hash output length), considering the blob deduplication feature of registries, it is better that `B` and `L` are fixed.

- For `B`, we can choose 4 MB, 256 MB or other values (we can decide later). 
- For `L`, we can choose `256` for `ParallelHash128` and `512` for `ParallelHash256`.

It is worth noting that each data block has a standard `SHAKE` hash instead of some internal states. Therefore, the following large blob upload scenario is made possible.

1. `POST /v2/<name>/blobs/uploads/` to obtain an UUID for blob uploading
2. Break the blob into chunks of block size, and upload the chunks in parallel in an out-of-order fashion: `PATCH  /v2/<name>/blobs/uploads/<uuid>?digest=<shake_digest>`.
   - In the future, the chunk size can be the multiple of the block size, but I need to think about how to transfer multiple digests to a `PATCH` request or equivalent alternatives.
   - If `?digest=<shake_digest>` is not provided, the chunks MUST be uploaded in sequential order and follow today's distribution spec. That is, we have backward compatibility if the registry accepts `ParallelHash` as digest algorithm but does not support out-of-order chunk uploads.
3. `POST /v2/<name>/blobs/uploads/<uuid>?digest=<parallelhash_digest>` to finalize the blob upload with a list of `SHAKE` digests for the uploaded chunks in the request body.
   - If the block size is 4 MiB, a 40 GiB blob will require $10,240 \times 64 = 655,360$ bytes (i.e. 640 KiB) of hash metadata. Even those metadata are base64 encoded, they are still less than 1 MiB.

Additionally, it opens a possibility that clients can upload the blocks as standard blobs to the registry if the registry accepts `SHAKE` algorithms for digests so that those blocks can be directly referenced by a manifest or a chunked-index blob. Later, the clients can ask the registry to assemble those blocks to a larger blob (see also [Put Block From URL (Azure Storage)](https://learn.microsoft.com/rest/api/storageservices/put-block-from-url) and [UploadPartCopy (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPartCopy.html)).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Parallel Uploading with Parallel Hashing #574

Introduction

Issue Analysis

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Parallel Uploading with Parallel Hashing #574

Description

Introduction

Issue Analysis

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions