Skip to content

Move to updated GPTQ with new PTB and C4 eval #541

@Loufe

Description

@Loufe

Description

As visible in the referenced commit below, the GPTQ authors published some improvements to the quantization and those changes are now in QWOP's implementation.

I imagine some testing is required as QWOP didn't implement the changes as a PR with confirmed quality before merging. That said, at least adding a flag to TGW such that --new-eval is passed on to QWOP would be a good place to start for testing.

Additional Context

Added to the readme for GPTQ-for-LLaMA:

  • Changed to support new features proposed by GPTQ.
  • Slightly adjusted preprocessing of C4 and PTB for more realistic evaluations (used in our updated results); can be activated via the flag --new-eval.
  • two new tricks:--act-order (quantizing columns in order of decreasing activation size) and --true-sequential (performing sequential quantization even within a single Transformer block). Those fix GPTQ's strangely bad performance on the 7B model (from 7.15 to 6.09 Wiki2 PPL) and lead to slight improvements on most models/settings in general.

Commits as of early 2023-03-24 by QWOPQWOP200

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions