Quantization model behavior changed

### System Info

torch 2.10.0
peft 0.18.2.dev0
bitsandbytes 0.49.1
The only variable is transformers.

### Who can help?

@ArthurZucker 

### Reproduction

The regression was found in peft tests:
https://github.com/jiqing-feng/peft/blob/8bit/tests/test_gpu_examples.py#L2901
`RUN_SLOW=1 pytest tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit`

### Expected behavior

The previous tests could pass before; after the PR https://github.com/huggingface/transformers/pull/42805:
```
FAILED tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit[cuda] - AssertionError: assert tensor(3.6478e-09, device='cuda:0', grad_fn=<MeanBackward0>) < (tensor(3.2703e-09, device='cud...
FAILED tests/test_gpu_examples.py::TestLoftQ::test_bloomz_loftq_8bit[cpu] - assert tensor(2.7105e-09, grad_fn=<MeanBackward0>) < (tensor(2.0073e-09, grad_fn=<MeanBackward0>) / 1.005)
```

However, I suppose it's a correct change. Before this change, the quantized models were always loaded as float16 model (embedding and lm_head weight type if no dtype specified). After this change, the quantized models are loaded as float32 model if no dtype is specified. I just want to make sure we have aligned with it.
After we have aligned and agreed with the PR https://github.com/huggingface/transformers/pull/42805, I will update the peft tests.

cc @BenjaminBossan @matthewdouglas 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization model behavior changed #43725

System Info

Who can help?

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Quantization model behavior changed #43725

Description

System Info

Who can help?

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions