Skip to content

Quantization model behavior changed #43725

@jiqing-feng

Description

@jiqing-feng

System Info

torch 2.10.0
peft 0.18.2.dev0
bitsandbytes 0.49.1
The only variable is transformers.

Who can help?

@ArthurZucker

Reproduction

The regression was found in peft tests:
https://github.com/jiqing-feng/peft/blob/8bit/tests/test_gpu_examples.py#L2901
RUN_SLOW=1 pytest tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit

Expected behavior

The previous tests could pass before; after the PR #42805:

FAILED tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit[cuda] - AssertionError: assert tensor(3.6478e-09, device='cuda:0', grad_fn=<MeanBackward0>) < (tensor(3.2703e-09, device='cud...
FAILED tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit[cpu] - assert tensor(2.7105e-09, grad_fn=<MeanBackward0>) < (tensor(2.0073e-09, grad_fn=<MeanBackward0>) / 1.005)

However, I suppose it's a correct change. Before this change, the quantized models were always loaded as float16 model (embedding and lm_head weight type if no dtype specified). After this change, the quantized models are loaded as float32 model if no dtype is specified. I just want to make sure we have aligned with it.
After we have aligned and agreed with the PR #42805, I will update the peft tests.

cc @BenjaminBossan @matthewdouglas

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions