Hotfix: Set float32 as default dtype for testing tiny models by albertvillanova · Pull Request #4770 · huggingface/trl

albertvillanova · 2026-01-02T14:11:40Z

Set float32 as default dtype for testing tiny models, after the merge in transformers of this PR:

Default auto 🚨 🚨 transformers#42805

HuggingFaceDocBuilderDev · 2026-01-02T14:14:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-01-02T17:37:36Z

After investigating, here are a few elements that can help understand what's happening here:

Transformers dtype default behavior changed

With transformers<=4.57, if we omit dtype in from_pretrained, the model is loaded in float32 by default. However, if we pass dtype="auto", the dtype follows the model config / checkpoint metadata:

from transformers import AutoModelForCausalLM  # v4.57.2

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
model.dtype  # torch.float32

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", dtype="auto")
model.dtype  # torch.bfloat16

Starting with transformers v5, dtype="auto" appears to be the new default (which is better IMO), so models may now be loaded directly in bf16/fp16 depending on the model/config:

from transformers import AutoModelForCausalLM  # v5.0.0

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
model.dtype  # torch.bfloat16

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", dtype="auto")
model.dtype  # torch.bfloat16

This explains why some tests that previously ran in float32 now run in float16/bfloat16, and as you pointed out, this can lead to situations where some parameters are not updated.

Longer-term: TRL should provide training-oriented defaults

More broadly, I think TRL should aim to provide safe and stable defaults for training.

In particular, we should distinguish between:

weight dtype at load time (how parameters are stored)
compute dtype during training (forward/backward autocast, grad scaling, etc.)

From a training stability perspective, the most robust default is usually:

load weights in float32 by default, unless the user explicitly requests otherwise
use mixed precision as a training-time optimization (bf16)

The second point is already aligned with TRL defaults (e.g. enabling mixed precision in configs):

trl/trl/trainer/sft_config.py

Lines 124 to 131 in 2337cc9

    
           bf16: bool | None = field( 
        
               default=None, 
        
               metadata={ 
        
                   "help": "Whether to use bf16 (mixed) precision instead of 32-bit. Requires Ampere or higher NVIDIA " 
        
                   "architecture or Intel XPU or using CPU (use_cpu) or Ascend NPU. If not set, it defaults to `True` if " 
        
                   "`fp16` is not set." 
        
               }, 
        
           )

trl/trl/trainer/sft_config.py

Line 277 in 2337cc9

self.bf16 = not (self.fp16) if self.bf16 is None else self.bf16

However, it looks like the load dtype often follows the model dtype, which can implicitly put users/tests into fp16/bf16 without intent:

trl/trl/trainer/utils.py

Line 1145 in 2337cc9

dtype = kwargs.get("dtype", "auto")

Proposal

A longer-term solution could be:

Make the default load dtype fp32: when the user passes a model ID
In tests that manually load models (e.g. this one, explicitly set dtype=float32 so the tests don’t depend on upstream defaults

The key idea is: we should not end up training in the model dtype unless it’s intentional, especially in tests that are not meant to validate this specific (and likely unstable) case.

albertvillanova · 2026-01-05T10:48:01Z

Thanks for your review, @qgallouedec: I totally agree.

In this PR I was preliminary testing that setting float32 as the default precision at loading time was indeed fixing the CI failures: as it actually does: https://github.com/huggingface/trl/actions/runs/20663612268/job/59331232203?pr=4770

As an alignment with your long-term proposal, I agree we should set float32 as the default precision at loading time.

qgallouedec

lgtm then!

…uggingface#4770)" This reverts commit ca16441.

…face#4770)

Set float32 as default dtype for testing tiny models

c2a37a0

Patch base classes instead

b8022c8

qgallouedec approved these changes Jan 5, 2026

View reviewed changes

albertvillanova changed the title ~~Set float32 as default dtype for testing tiny models~~ Hotfix: Set float32 as default dtype for testing tiny models Jan 6, 2026

albertvillanova merged commit ca16441 into huggingface:main Jan 6, 2026
8 of 9 checks passed

albertvillanova added a commit to albertvillanova/trl that referenced this pull request Jan 6, 2026

Revert "Hotfix: Set float32 as default dtype for testing tiny models (h…

8fa8a41

…uggingface#4770)" This reverts commit ca16441.

albertvillanova mentioned this pull request Jan 6, 2026

Set dtype default to float32 #4778

Merged

brozjak2 mentioned this pull request Mar 20, 2026

Outdated documentation after Transformers v5 dtype default behavior changed #5329

Open

songhappy pushed a commit to songhappy/trl that referenced this pull request Apr 20, 2026

Hotfix: Set float32 as default dtype for testing tiny models (hugging…

c849c03

…face#4770)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hotfix: Set float32 as default dtype for testing tiny models#4770

Hotfix: Set float32 as default dtype for testing tiny models#4770
albertvillanova merged 2 commits into
huggingface:mainfrom
albertvillanova:fix-4748

albertvillanova commented Jan 2, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jan 2, 2026

Uh oh!

qgallouedec commented Jan 2, 2026

Uh oh!

albertvillanova commented Jan 5, 2026 •

edited

Loading

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 2, 2026

Uh oh!

qgallouedec commented Jan 2, 2026

Transformers dtype default behavior changed

Longer-term: TRL should provide training-oriented defaults

Proposal

Uh oh!

albertvillanova commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Jan 2, 2026 •

edited

Loading

albertvillanova commented Jan 5, 2026 •

edited

Loading