Community contribution: Adding GGUF support for more architectures

### Feature request

Recently, we have added the ability to load `gguf` files within [transformers](https://huggingface.co/docs/hub/en/gguf).

<img src="https://github.com/user-attachments/assets/61df6455-6016-449e-a37f-9dfc7f918902" width="600">


The goal was to offer the possibility to users to further train/fine-tune their gguf models. 
<details>
<summary>See Workflow</summary>
1) Load gguf file in transformers: we dequantize the weights to fp32, then we load the weights to be used with PyTorch.

```py 
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
```
2) train/finetune

3) Convert the model back to gguf to use in the ggml ecosystem using [convert_hf_to_gguf](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py) script or using [gguf-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space if you pushed your model on the hub :
```py
tokenizer.save_pretrained('directory')
model.save_pretrained('directory')

!python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory}
```
</details>

Let's try to add GGUF support for more architectures! Currently supported architectures are

- [x] Llama
- [x] Mistral
- [x] Qwen2

It would be great to add the support for more architectures such as

- [x] Phi3 https://github.com/huggingface/transformers/pull/31844
- [x] Qwen2Moe https://github.com/huggingface/transformers/pull/33264
- [x] Gemma2 
- [x] T5 https://github.com/huggingface/transformers/pull/33389
- [x] Falcon https://github.com/huggingface/transformers/pull/33437
- [x] Bloom https://github.com/huggingface/transformers/pull/33473
- [x] StableLM https://github.com/huggingface/transformers/pull/33793
- [x] gpt2 https://github.com/huggingface/transformers/pull/34044
- [x] starcoder2 https://github.com/huggingface/transformers/pull/34094
- [ ] llama4
- [ ] Deepseekv3 
- [ ] c4ai-command-a

... and many more (Feel free to suggest more architectures ! The model needs to integrated in transformers)

Adding this feature would require to follow the same protocol as in this [PR](https://github.com/huggingface/transformers/pull/31175/files) : 
1) Update `GGUF_TENSOR_MAPPING` and `GGUF_CONFIG_MAPPING` in order to map the tensor/config of the gguf file to the one on transformers. 
2) Create a `GGUFXXXConverter(XXXConverter)` class to convert the gguf tokenizer to a transformers one. 
3) Write tests


If you are interested to take up the challenge, comment below with the architecture name you want to integrate and open a PR! 

Once you open a PR, feel free to ping @SunMarc @LysandreJik @ArthurZucker for a review ! 

### Motivation

Support for more gguf models

### Your contribution

Reviewing PRs and possibly adding the support for more models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Community contribution: Adding GGUF support for more architectures #33260

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Community contribution: Adding GGUF support for more architectures #33260

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions