Fix multi-GPU loading for quantized models in distributed training by Fizza-Mukhtar · Pull Request #3917 · unslothai/unsloth

Fizza-Mukhtar · 2026-01-21T09:57:36Z

Problem

Multi-GPU training fails for quantized (4bit/8bit/fp8) models due to
Accelerate attempting to move the model across devices.

Root Cause

Quantized models cannot be relocated after loading.
The model was loaded before per-rank device placement was enforced.

Solution

Detect distributed training at load time
Load quantized models on the correct per-rank GPU
Preserve existing behavior for single-GPU and expert users

Impact

Enables stable multi-GPU training for GRPO and Vision models with quantization.

fixes #3914

When using torchrun with quantized models (4bit/8bit/fp8), each rank must load the model directly onto its own GPU. The default device_map ("sequential") places everything on GPU 0, causing illegal memory access errors when Accelerate tries to relocate quantized weights. Use the existing prepare_device_map() utility from loader_utils to detect distributed training via LOCAL_RANK/WORLD_SIZE env vars and override device_map to target each rank's local GPU. This is applied in both FastLanguageModel.from_pretrained and FastModel.from_pretrained, covering text, vision, and audio model paths. Fixes unslothai#3914

danielhanchen · 2026-02-09T12:26:18Z

Thanks @Fizza-Mukhtar - had to fix some of your changes - appreciate the help

danielhanchen force-pushed the fix/grpo-multigpu-quantized-loading-v2 branch from be668e0 to b2a66fc Compare February 9, 2026 12:23

danielhanchen merged commit da1eacc into unslothai:main Feb 9, 2026
1 check passed

Fizza-Mukhtar deleted the fix/grpo-multigpu-quantized-loading-v2 branch February 10, 2026 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Fix multi-GPU loading for quantized models in distributed training#3917

Fix multi-GPU loading for quantized models in distributed training#3917
danielhanchen merged 1 commit intounslothai:mainfrom
Fizza-Mukhtar:fix/grpo-multigpu-quantized-loading-v2

Fizza-Mukhtar commented Jan 21, 2026

Uh oh!

danielhanchen commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

Fizza-Mukhtar commented Jan 21, 2026

Problem

Root Cause

Solution

Impact

Uh oh!

danielhanchen commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants