-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Description
As visible in the referenced commit below, the GPTQ authors published some improvements to the quantization and those changes are now in QWOP's implementation.
I imagine some testing is required as QWOP didn't implement the changes as a PR with confirmed quality before merging. That said, at least adding a flag to TGW such that --new-eval is passed on to QWOP would be a good place to start for testing.
Additional Context
Added to the readme for GPTQ-for-LLaMA:
- Changed to support new features proposed by GPTQ.
- Slightly adjusted preprocessing of C4 and PTB for more realistic evaluations (used in our updated results); can be activated via the flag --new-eval.
- two new tricks:--act-order (quantizing columns in order of decreasing activation size) and --true-sequential (performing sequential quantization even within a single Transformer block). Those fix GPTQ's strangely bad performance on the 7B model (from 7.15 to 6.09 Wiki2 PPL) and lead to slight improvements on most models/settings in general.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request