Adding support for MXFP4,6. #611
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adding support for new quantization types accepted by hardware manufacturers.
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
The spec for those dtypes is quite laxed:
In response, safetensors doesn't make a choice either regarding block size. MXFP? are supposed to be stored in their own tensors, and the scales are store as E8M0 in another tensor, most likely having a dimension being the block size. Then implementors are in charge of putting scales and blocks close enough in the proper locations, because those are kernel dependant (so safetensors cannot make a choice here).
Regarding specifically fp4 and fp6, they introduce a new kind of issue, where some operations might be saving data outside of a byte. Since, those are normally supposed to be few and far between in real world use cases (tensors tend to be large power 2s, meaning slices tend to still be byte aligned) the parti pris here, is simply to raise errors whenever some operation is not byte aligned.
Technically, we could recover things, by padding/ignoring sub byte chunks, but also could invalidate the zero-copy promise (a f6 indexing within a byte, shoud required a bitshift of the entire array to be byte-aligned in order to become addressable).
For now this is left for future work.
Fixes # (issue) or description of the problem this PR solves.