Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K #427
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have made the exact same mistake a number of times.
On$z_i = x_{2i} y_{2i} + x_{2i+1} y_{2i+1}$ . The quant values $c \cdot b$ , where $b = \sum y_i$ has been pre-computed when quantizing the activations
AVX2
the instruction to perform dot products ofint8_t
vectors (as needed in quantized matrix multiplications) is_mm256_maddubs_epi8(x, y)
, wherex
must be unsigned andy
signed, and the result is a SIMD vector of signedint16_t
valuesx
and quantized activationsy
are signed, so one way to deal with the the strangeness of this instruction is to add a suitable constant valuec
tox
so that it becomes unsigned, use_mm256_maddubs_epi8(c+x, y)
to accumulate the dot product, and at the end subtracty
. The issue arises when thex
values span the fullint8_t
range as it is the case with the non-linear quantsIQ4_NL, IQ4_XS, IQ4_K, IQ4_KS, IQ5_K, IQ5_KS, IQ6_K
. In that casec = 128
, thec+x
values span the fulluint8_t
range, and hence it is possible to overflow the signedint16_t
range.I had though that I had fixed this mistake, but while working on the
IQ5_KS
type added in PR #422 I noticed that the issue still existsIQ4_K, IQ4_KS, IQ5_K, IQ6_K
and was only fixed for the corresponding repacked variants.The PR corrects the problem. There will be a slight (a few percent) PP performance degradation on
AVX2
for these quantization types.