Skip to content

Commit 7791f15

Browse files
authored
Merge pull request #936 from OptimalScale/lmflow-nightly
LMFlow major update - Accelerate support
2 parents 397f00d + 9dd1bc7 commit 7791f15

File tree

280 files changed

+8033
-15668
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

280 files changed

+8033
-15668
lines changed

.gitattributes

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
*.ipynb linguist-detectable=false
44
*RAFT.pdf filter=lfs diff=lfs merge=lfs -text
55
*.gif filter=lfs diff=lfs merge=lfs -text
6-
assets/*.gif filter=lfs diff=lfs merge=lfs -text
6+
docs/figs/*.gif filter=lfs diff=lfs merge=lfs -text

.gitignore

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,13 @@ log/
1818
regression_test/*/new_output_models
1919
regression_test/*/new_log
2020
output_dir/
21+
tests_out
2122

2223
# data files
2324
data/
2425

2526
# output models
26-
output_models/
27+
output_models
2728
adapter_model/
2829

2930
# Distribution / packaging
@@ -168,9 +169,6 @@ debug.env
168169
#ctags
169170
tags
170171

171-
# pre-commit
172-
.pre-commit*
173-
174172
# .lock
175173
*.lock
176174

.pre-commit-config.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
repos:
2+
- repo: https://github.com/astral-sh/ruff-pre-commit
3+
rev: "v0.11.4"
4+
hooks:
5+
- id: ruff
6+
args: ["--fix", "--show-fixes", "--output-format=full"]
7+
exclude: ^.*\.(ipynb)$
8+
- id: ruff-format

README.md

Lines changed: 34 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<p align="center" width="50%">
2-
<img src="assets/logo.png" alt="LMFlow" style="width: 50%; min-width: 200px; display: block; margin: auto; background-color: transparent;">
2+
<img src="docs/assets/logo.png" alt="LMFlow" style="width: 50%; min-width: 200px; display: block; margin: auto; background-color: transparent;">
33
</p>
44

55
# LMFlow
@@ -26,25 +26,26 @@
2626
An extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.
2727

2828
<p align="center" width="100%">
29-
<img src="assets/features.png" alt="LMFlow-features" style="width: 100%; min-width: 300px; display: block; margin: auto;">
29+
<img src="docs/assets/features.png" alt="LMFlow-features" style="width: 100%; min-width: 300px; display: block; margin: auto;">
3030
</p>
3131

3232
## Latest News
33+
> [!IMPORTANT]
34+
> * :exclamation: [2025-07-09] We have a major update to LMFlow with full Accelerate support and extensive streamlining. If you're looking for the previous version, please use `git checkout v0.0.10`, or check out the [v0.0.10 branch](https://github.com/OptimalScale/LMFlow/tree/v0.0.10). View all releases [here](https://github.com/OptimalScale/LMFlow/tags).
3335
34-
* [2025-03-18] With full support for Accelerate and lots of streamlining, LMFlow-nightly is now available! Feel free to try out the latest features and improvements by `git checkout lmflow-nightly`.
3536
* [2024-12-02] Support [Hymba](https://github.com/NVlabs/hymba), a new family of small language models featuring a hybrid-head parallel architecture. Check out [Post-training Hymba](https://github.com/OptimalScale/LMFlow/tree/main/experimental/Hymba) for more details.
3637
* [2024-07-01] 🏆 LMFlow receives the [**Best Demo Paper Award**](https://docs.google.com/presentation/d/1TVDooAZqkNObz5ysVhDFtqnnVHR-u8wqYvgix-gzPMs/edit#slide=id.g2e55907bbcc_0_70) at **NAACL 2024**! 🎉
3738
* [2024-06-30] Expanding Optimization Options! We now support custom optimizer training with a variety of optimizers. Dive into the details and try out the new features with our updated script at [custom_optimizers](https://github.com/OptimalScale/LMFlow/blob/main/scripts/run_finetune_with_custom_optim.sh).
3839
* [2024-04-25] :rocket: Support conversation template! We've preset the latest [Llama-3](https://huggingface.co/meta-llama/Meta-Llama-3-70B) and [Phi-3](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) conversation templates as well as some frequently used templates such as `chatml` (see all templates [here](https://optimalscale.github.io/LMFlow/examples/DATASETS.html#conversation-template)), and we are working on adding more preset templates. Adding corresponding `--conversation_template` in the shell script and you are all set! :rocket:
39-
* [2024-03-27] Support [LISA](https://arxiv.org/abs/2403.17919), enabling 7B training in 24G memory without offloading!
40-
* [2023-09-11] Support [speculative decoding](https://arxiv.org/abs/2211.17192). Check out [speculative_decoding](https://github.com/OptimalScale/LMFlow/blob/main/scripts/speculative_decoding/README.md) for the usage and acceleration details.
41-
* [2023-08-14] Support long context inference with position interpolation (Linear & NTK scaling ) for LLaMA models. Check out [postion_interpolation](https://github.com/OptimalScale/LMFlow/blob/main/readme/Position_Interpolation.md) for more details.
4240

4341
<details> <summary>More news...</summary>
4442

43+
* [2024-03-27] Support [LISA](https://arxiv.org/abs/2403.17919), enabling 7B training in 24G memory without offloading!
44+
* [2023-09-11] Support [speculative decoding](https://arxiv.org/abs/2211.17192). Check out [speculative_decoding](https://github.com/OptimalScale/LMFlow/blob/main/scripts/speculative_decoding/README.md) for the usage and acceleration details.
45+
* [2023-08-14] Support long context inference with position interpolation (Linear & NTK scaling ) for LLaMA models. Check out [postion_interpolation](https://github.com/OptimalScale/LMFlow/blob/main/readme/Position_Interpolation.md) for more details.
4546
* [2023-08-07] Support [Flash Attention-2](https://crfm.stanford.edu/2023/07/17/flash2.html). Check out [flash_attention](https://github.com/OptimalScale/LMFlow/blob/main/readme/flash_attn2.md) for more details.
4647
* [2023-08-02] Support [Llama2](https://ai.meta.com/llama/), [ChatGLM2](https://huggingface.co/THUDM/chatglm2-6b), and [Baichuan](https://huggingface.co/baichuan-inc/Baichuan-7B) models.
47-
* [2023-07-23] [LMFlow multimodal chatbot](https://github.com/OptimalScale/LMFlow/blob/main/scripts/run_vis_chatbot_gradio_minigpt4.sh) is now available! Support multimodal inputs of images and texts. [Online Demo](http://multimodal.lmflow.online) is also provided (We hold the service on a single GPU, hence one may experience "queuing" or "application busy" sometimes when multiple users are accessing at the same time, please wait and attempt again later when such event happens)![image](https://github.com/OptimalScale/LMFlow/blob/rpan-vision-encoder/assets/multimodal-chatbot-demo.gif)
48+
* [2023-07-23] [LMFlow multimodal chatbot](https://github.com/OptimalScale/LMFlow/blob/main/scripts/run_vis_chatbot_gradio_minigpt4.sh) is now available! Support multimodal inputs of images and texts. [Online Demo](http://multimodal.lmflow.online) is also provided (We hold the service on a single GPU, hence one may experience "queuing" or "application busy" sometimes when multiple users are accessing at the same time, please wait and attempt again later when such event happens)![image](https://github.com/OptimalScale/LMFlow/blob/rpan-vision-encoder/docs/assets/multimodal-chatbot-demo.gif)
4849
* [2023-06-22] [LMFlow paper](https://arxiv.org/abs/2306.12420) is out! Check out our implementation details at https://arxiv.org/abs/2306.12420
4950
* [2023-06-16] Our finetuned Robin-33B-V2 scored an impressive 64.1 on the Huggingface LLM leaderboard in our offline evaluation, outperforming major open-source LLMs! All checkpoints (7B, 13B, 33B, and 65B) are [released](https://huggingface.co/OptimalScale)! Checkout the performance [here](https://medium.com/@hkust.ml/robin-v2-launches-achieves-unparalleled-performance-on-openllm-4f6886e822c1).
5051
* [2023-06-07] LMFlow is now officially available on PyPI! Install it with `pip install lmflow-finetune`!
@@ -69,11 +70,11 @@ An extensible, convenient, and efficient toolbox for finetuning large machine le
6970
- [LMFlow](#lmflow)
7071
- [Latest News](#latest-news)
7172
- [Table of Contents](#table-of-contents)
72-
- [Supported Models](#supported-models)
7373
- [Quick Start](#quick-start)
7474
- [Setup](#setup)
7575
- [Prepare Dataset](#prepare-dataset)
7676
- [Finetuning](#finetuning)
77+
- [Estimated Hardware Requirement](#estimated-hardware-requirement)
7778
- [Full Finetuning](#full-finetuning)
7879
- [LISA](#lisa)
7980
- [LoRA](#lora)
@@ -85,21 +86,6 @@ An extensible, convenient, and efficient toolbox for finetuning large machine le
8586
- [License](#license)
8687
- [Citation](#citation)
8788

88-
## Supported Models
89-
90-
See all conversation template details [here](https://optimalscale.github.io/LMFlow/examples/supported_conversation_template.html).
91-
92-
| Model | Conversation Template |
93-
| :---: | :-------------------: |
94-
| DeepSeek | `deepseek` <br> `deepseek_v2` <br> `deepseek_r1` <br> `deepseek_r1_distill` <br> `deepseek_v3` |
95-
| Gemma | `gemma` |
96-
| Hymba | `hymba` |
97-
| InternLM2 | `internlm2` |
98-
| LLaMA | `llama2` <br> `llama3` <br> `llama3_for_tool`|
99-
| Phi | `phi3` |
100-
| Qwen | `qwen2` <br> `qwen2_for_tool` <br> `qwen2_5` <br> `qwen2_5_1m` <br> `qwen2_5_math` <br> `qwen_qwq` |
101-
| Yi | `yi` <br> `yi1_5` |
102-
| Zephyr | `zephyr` |
10389

10490
## Quick Start
10591

@@ -108,15 +94,28 @@ See all conversation template details [here](https://optimalscale.github.io/LMFl
10894
Our package has been tested on Linux OS (Ubuntu 20.04). Other OS platforms (MacOS, Windows) are not fully tested, where you may encounter unexpected errors. If you are using LMFlow for the first time, we recommend you to try on a Linux machine or Google Colab.
10995

11096
```bash
111-
git clone -b v0.0.9 https://github.com/OptimalScale/LMFlow.git
97+
git clone -b v1.0.0 https://github.com/OptimalScale/LMFlow.git
98+
cd LMFlow
99+
conda create -n lmflow python=3.9 -y
100+
conda activate lmflow
101+
conda install mpi4py
102+
pip install -e .
103+
```
104+
105+
<details><summary> Looking for a previous version? </summary>
106+
107+
```bash
108+
git clone -b v0.0.10 https://github.com/OptimalScale/LMFlow.git
112109
cd LMFlow
113110
conda create -n lmflow python=3.9 -y
114111
conda activate lmflow
115112
conda install mpi4py
116113
pip install -e .
117114
```
118115

119-
<details><summary> for CUDA versions 10.3-11.7 </summary>
116+
</details>
117+
118+
<details><summary> For CUDA versions 10.3-11.7 </summary>
120119

121120
```bash
122121
git clone -b v0.0.5 https://github.com/OptimalScale/LMFlow.git
@@ -162,6 +161,16 @@ Please refer to our [doc](https://optimalscale.github.io/LMFlow/examples/DATASET
162161
163162
### Finetuning
164163
164+
#### Estimated Hardware Requirement
165+
166+
| Method | 0.5B | 3B | 7B | 14B | 30B | 70B | `x`B |
167+
| ---------------------- | ---- | ---- | ---- | ----- | ----- | ----- | ------- |
168+
| Full `bf16`/`fp16` | 9GB | 55GB |120GB | 240GB | 600GB | 1200GB| `18x`GB |
169+
| LoRA | 1GB | 6GB | 16GB | 32GB | 64GB | 160GB | `2x`GB |
170+
| QLoRA `quant_bit=8` | 0.7GB| 3GB | 10GB | 20GB | 40GB | 80GB| `x`GB |
171+
| QLoRA `quant_bit=4` | 0.4GB| 1.5GB| 6GB | 12GB | 24GB | 48GB| `x/2`GB |
172+
173+
165174
#### Full Finetuning
166175
167176
Full training updates all the parameters to finetune a language model.

configs/accelerate_fsdp_config.yaml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
compute_environment: LOCAL_MACHINE
2+
debug: false
3+
distributed_type: FSDP
4+
5+
fsdp_config:
6+
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
7+
fsdp_min_num_params: 1000000
8+
fsdp_backward_prefetch: BACKWARD_PRE
9+
fsdp_forward_prefetch: false
10+
fsdp_cpu_ram_efficient_loading: true
11+
fsdp_offload_params: false
12+
fsdp_sharding_strategy: FULL_SHARD
13+
fsdp_state_dict_type: FULL_STATE_DICT
14+
fsdp_sync_module_states: true
15+
fsdp_use_orig_params: true
16+
17+
downcast_bf16: true
18+
machine_rank: 0
19+
main_training_function: main
20+
mixed_precision: bf16
21+
num_machines: 1
22+
num_processes: 8 # NOTE: distributed_type should be `NO` if you're training on a single GPU
23+
rdzv_backend: static
24+
same_network: true
25+
tpu_env: []
26+
tpu_use_cluster: false
27+
tpu_use_sudo: false
28+
use_cpu: false
29+
main_process_port: 1204
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
compute_environment: LOCAL_MACHINE
2+
debug: false
3+
distributed_type: 'NO'
4+
5+
fsdp_config:
6+
fsdp_auto_wrap_policy: SIZE_BASED_WRAP
7+
fsdp_min_num_params: 1000000
8+
fsdp_backward_prefetch: BACKWARD_PRE
9+
fsdp_forward_prefetch: false
10+
fsdp_cpu_ram_efficient_loading: true
11+
fsdp_offload_params: false
12+
fsdp_sharding_strategy: 'NO_SHARD'
13+
fsdp_state_dict_type: FULL_STATE_DICT
14+
fsdp_sync_module_states: true
15+
fsdp_use_orig_params: true
16+
17+
downcast_bf16: true
18+
machine_rank: 0
19+
main_training_function: main
20+
mixed_precision: bf16
21+
num_machines: 1
22+
num_processes: 1
23+
rdzv_backend: static
24+
same_network: true
25+
tpu_env: []
26+
tpu_use_cluster: false
27+
tpu_use_sudo: false
28+
use_cpu: false
29+
main_process_port: 1204
File renamed without changes.

0 commit comments

Comments
 (0)