Skip to content

Proposal: Zero-Extraction Layer Mounting via Native UDF Trailer Indexing #2204

@orent

Description

@orent

Abstract

Implement a snapshotter or image-mounter that treats uncompressed OCI tar layers as raw block devices. By appending a minimal UDF (Universal Disk Format) metadata trailer to an existing uncompressed tarball, the kernel can mount the layer natively using udf.ko. This bypasses the need for extraction (untarring) and eliminates the performance overhead associated with FUSE-based solutions.

Motivation

Standard OCI image extraction is a major bottleneck for container startup and host performance, particularly when dealing with "small-file" metadata pressure on the host filesystem. While projects like stargz-snapshotter address "lazy-pulling," they typically rely on FUSE to bridge the gap between compressed blobs and the VFS.

This proposal offers a "Missing Middle": a way to achieve zero-extraction for local or pre-fetched uncompressed layers using a stable, legacy kernel driver already present in virtually every Linux distribution.

Technical Concept

The architecture treats the uncompressed tarball as the "data partition" of a UDF volume, while placing the metadata descriptors in a synthesized trailer at the end of the file.

  • Block-Level Alignment: UDF supports 512-byte logical blocks. Since tar records are 512-byte aligned, the UDF Information Control Block (ICB) allocation descriptors can point directly to the byte-offsets of file data within the tarball, effectively "skipping" the 512-byte tar headers.
  • On-the-Fly Indexing: A single pass over the tar stream generates a UDF metadata trailer (File Identifier Descriptors, ICBs, and Directory structures).
  • The "Trailer-Only" Mount: By leveraging specific udf.ko mount options, we can bypass standard volume identification and point the kernel directly to the trailer:
    mount -t udf -o loop,novrs,anchor=<BLOCK_OFFSET>,bs=512 layer.tar /mnt
    • novrs: Bypasses Volume Recognition Sequence (VRS) discovery.
    • anchor: Explicitly defines the location of the Anchor Volume Descriptor Pointer.
  • OCI Compatibility: OCI whiteouts (.wh.*) and opaque markers are mapped as literal files. The UDF mount serves as a standard lowerdir for OverlayFS, which handles the whiteout logic natively.

Advantages

  • Native Performance: Bypasses FUSE context-switching; utilizes native kernel page cache and the highly optimized udf.ko block-reading path.
  • Zero Extraction: No per-file metadata/inode allocation on the host filesystem. One layer = One Inode on the host.
  • Standard Driver: Relies on udf.ko, which is ubiquitous and supports 512-byte blocks and anchor overrides on modern kernels.
  • Atomic Operations: Image cleanup and management become O(1) file operations rather than O(N) recursive directory deletions.

Potential Implementation

This could be integrated as a specialized snapshotter for containerd. The implementation would require a lean tar2udf generator capable of synthesizing the required ECMA-167 descriptors (Anchor, VDS, LVD, FSD, and ICBs) and calculating the necessary CRC16 checksums for the trailer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions