Description
Feature request
Recently, we have added the ability to load gguf
files within transformers.

The goal was to offer the possibility to users to further train/fine-tune their gguf models.
See Workflow
1) Load gguf file in transformers: we dequantize the weights to fp32, then we load the weights to be used with PyTorch.from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
-
train/finetune
-
Convert the model back to gguf to use in the ggml ecosystem using convert_hf_to_gguf script or using gguf-my-repo space if you pushed your model on the hub :
tokenizer.save_pretrained('directory')
model.save_pretrained('directory')
!python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory}
Let's try to add GGUF support for more architectures! Currently supported architectures are
- Llama
- Mistral
- Qwen2
It would be great to add the support for more architectures such as
- Phi3 Add support for GGUF Phi-3 #31844
- Qwen2Moe Add Qwen2Moe GGUF loading support #33264
- Gemma2
- T5 Add T5 GGUF loading support #33389
- Falcon Add falcon gguf #33437
- Bloom Add gguf support for bloom #33473
- StableLM Add gguf support for StableLM #33793
- gpt2 Add gguf support for gpt2 #34044
- starcoder2 Add GGUF for starcoder2 #34094
- llama4
- Deepseekv3
- c4ai-command-a
... and many more (Feel free to suggest more architectures ! The model needs to integrated in transformers)
Adding this feature would require to follow the same protocol as in this PR :
- Update
GGUF_TENSOR_MAPPING
andGGUF_CONFIG_MAPPING
in order to map the tensor/config of the gguf file to the one on transformers. - Create a
GGUFXXXConverter(XXXConverter)
class to convert the gguf tokenizer to a transformers one. - Write tests
If you are interested to take up the challenge, comment below with the architecture name you want to integrate and open a PR!
Once you open a PR, feel free to ping @SunMarc @LysandreJik @ArthurZucker for a review !
Motivation
Support for more gguf models
Your contribution
Reviewing PRs and possibly adding the support for more models