Skip to content

Conversation

@Narsil
Copy link
Contributor

@Narsil Narsil commented May 30, 2025

What does this PR do?

Adding support for new quantization types accepted by hardware manufacturers.
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

The spec for those dtypes is quite laxed:

Therefore, each block of 𝑘 elements can be encoded in (𝑤 + 𝑘𝑑) bits. The layout of the block in
physical memory is not prescribed in this specification. If multiple blocks share the same scale
factor, an implementation can compress or prune away the repeated scale factors. An
implementation can store the scale factor 𝑋 contiguously with or separately from the k elements.

In response, safetensors doesn't make a choice either regarding block size. MXFP? are supposed to be stored in their own tensors, and the scales are store as E8M0 in another tensor, most likely having a dimension being the block size. Then implementors are in charge of putting scales and blocks close enough in the proper locations, because those are kernel dependant (so safetensors cannot make a choice here).

Regarding specifically fp4 and fp6, they introduce a new kind of issue, where some operations might be saving data outside of a byte. Since, those are normally supposed to be few and far between in real world use cases (tensors tend to be large power 2s, meaning slices tend to still be byte aligned) the parti pris here, is simply to raise errors whenever some operation is not byte aligned.

Technically, we could recover things, by padding/ignoring sub byte chunks, but also could invalidate the zero-copy promise (a f6 indexing within a byte, shoud required a bitshift of the entire array to be byte-aligned in order to become addressable).

For now this is left for future work.

Fixes # (issue) or description of the problem this PR solves.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Narsil Narsil merged commit faaeaf0 into main Jun 15, 2025
23 checks passed
@Narsil Narsil deleted the support_mxfp branch June 15, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants