Tracking follow-up work on the TurboQuant/HIGGS KV cache attention backend initially landed in #38479.
Backend coverage
Accuracy
Feature compatibility
Things currently disabled or unverified with the TurboQuant backend; enable and test:
Performance
cc @vibhavagarwal5
Tracking follow-up work on the TurboQuant/HIGGS KV cache attention backend initially landed in #38479.
Backend coverage
flash_attn_varlen_functo FA3/4, not just FA2Accuracy
--kv-cache-dtype-skip-layersdefaultsFeature compatibility
Things currently disabled or unverified with the TurboQuant backend; enable and test:
Performance
cc @vibhavagarwal5