Skip to content

Commit 5642ebf

Browse files
committed
Init commit (move from the private repo to the public one)
1 parent eff797c commit 5642ebf

File tree

110 files changed

+31425
-52
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+31425
-52
lines changed

.gitignore

Lines changed: 26 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,26 @@
1+
.vscode/
2+
*.pt.trace.json
3+
*.hatchet
4+
tmps/
5+
results/
6+
unsloth_compiled_cache/
7+
benchmarks/llama-factory/LLaMA-Factory/data/
8+
benchmarks/llama-factory/LLaMA-Factory/
9+
benchmarks/llama-factory/results*
10+
benchmarks/llama-factory/data/
11+
benchmarks/peft/datasets/
12+
benchmarks/peft/llama3-8b-gsm8k-lora/
13+
lorafusion/simulator/_*.py
14+
benchmarks/DoRA/
15+
16+
# NPM
17+
node_modules
18+
package-lock.json
19+
/package.json
20+
121
# Byte-compiled / optimized / DLL files
222
__pycache__/
3-
*.py[codz]
23+
*.py[cod]
424
*$py.class
525

626
# C extensions
@@ -46,7 +66,7 @@ htmlcov/
4666
nosetests.xml
4767
coverage.xml
4868
*.cover
49-
*.py.cover
69+
*.py,cover
5070
.hypothesis/
5171
.pytest_cache/
5272
cover/
@@ -94,36 +114,23 @@ ipython_config.py
94114
# install all needed dependencies.
95115
#Pipfile.lock
96116

97-
# UV
98-
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99-
# This is especially recommended for binary packages to ensure reproducibility, and is more
100-
# commonly ignored for libraries.
101-
#uv.lock
102-
103117
# poetry
104118
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105119
# This is especially recommended for binary packages to ensure reproducibility, and is more
106120
# commonly ignored for libraries.
107121
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108122
#poetry.lock
109-
#poetry.toml
110123

111124
# pdm
112125
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
113-
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
114-
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
115126
#pdm.lock
116-
#pdm.toml
127+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
128+
# in version control.
129+
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
130+
.pdm.toml
117131
.pdm-python
118132
.pdm-build/
119133

120-
# pixi
121-
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
122-
#pixi.lock
123-
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
124-
# in the .venv directory. It is recommended not to include this directory in version control.
125-
.pixi
126-
127134
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
128135
__pypackages__/
129136

@@ -136,7 +143,6 @@ celerybeat.pid
136143

137144
# Environments
138145
.env
139-
.envrc
140146
.venv
141147
env/
142148
venv/
@@ -174,34 +180,3 @@ cython_debug/
174180
# and can be added to the global gitignore or merged into this file. For a more nuclear
175181
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
176182
#.idea/
177-
178-
# Abstra
179-
# Abstra is an AI-powered process automation framework.
180-
# Ignore directories containing user credentials, local state, and settings.
181-
# Learn more at https://abstra.io/docs
182-
.abstra/
183-
184-
# Visual Studio Code
185-
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
186-
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
187-
# and can be added to the global gitignore or merged into this file. However, if you prefer,
188-
# you could uncomment the following to ignore the entire vscode folder
189-
# .vscode/
190-
191-
# Ruff stuff:
192-
.ruff_cache/
193-
194-
# PyPI configuration file
195-
.pypirc
196-
197-
# Cursor
198-
# Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
199-
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
200-
# refer to https://docs.cursor.com/context/ignore-files
201-
.cursorignore
202-
.cursorindexingignore
203-
204-
# Marimo
205-
marimo/_static/
206-
marimo/_lsp/
207-
__marimo__/

.hadolint.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
ignored:
2+
- DL3006 # Always tag the version of an image explicitly.
3+
- DL3013 # Pin versions in pip.
4+
- DL3008 # Pin versions in apt get install.

.pre-commit-config.yaml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
exclude: |
2+
(?x)(
3+
^playgrounds/|
4+
^benchmarks/|
5+
^\.git/|
6+
^\.ruff_cache/|
7+
^loomtrain\.egg-info/|
8+
\.pt\.trace\.json$|
9+
\.txt$
10+
)
11+
repos:
12+
- repo: https://github.com/psf/black
13+
rev: 25.1.0
14+
hooks:
15+
- id: black
16+
language_version: python3
17+
args: [--line-length=88]
18+
- repo: https://github.com/astral-sh/ruff-pre-commit
19+
rev: v0.12.3
20+
hooks:
21+
- id: ruff
22+
args: [--fix, --exit-non-zero-on-fix]
23+
- id: ruff-format
24+
- repo: https://github.com/pre-commit/pre-commit-hooks
25+
rev: v5.0.0
26+
hooks:
27+
- id: check-added-large-files
28+
args: [--maxkb=1000]
29+
- id: check-json
30+
- id: check-yaml
31+
- id: check-toml
32+
- id: end-of-file-fixer
33+
- id: trailing-whitespace
34+
- repo: https://github.com/scop/pre-commit-shfmt
35+
rev: v3.12.0-1
36+
hooks:
37+
- id: shfmt
38+
args:
39+
- --indent=2
40+
- --write
41+
- --simplify
42+
- repo: https://github.com/shellcheck-py/shellcheck-py
43+
rev: v0.10.0.1
44+
hooks:
45+
- id: shellcheck
46+
- repo: https://github.com/hadolint/hadolint
47+
rev: v2.12.0
48+
hooks:
49+
- id: hadolint-docker
50+
name: dockerfile-lint
51+
types_or: [dockerfile]

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,14 @@
1-
# lorafusion
1+
# LoRAFusion
2+
3+
LoRAFusion: Efficient LoRA Fine-Tuning for LLMs
4+
5+
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/)
6+
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
7+
8+
## Guides and Documentation
9+
10+
- General Guides
11+
- [Installation](./docs/installation.md)
12+
- [Development](./docs/development.md)
13+
- Artifact Evaluation
14+
- [Evaluation Instructions](./benchmarks_paper/README.md)

benchmarks_paper/README.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# LoRAFusion Artifact Evaluation
2+
3+
> Note: Here the figure numbers can be a bit confusing, we use the "0" to represent the data distribution plot, and "1" is the
4+
main end-to-end plot in the evaluation section. Figure "2" and "3" are the two subfigures for L40 GPU evaluation. Figure "4"
5+
is the scaling of the different methods. Figure "5" is the kernel performance forward/backward. Figure "6" is the layer
6+
performance normalized. Figure "7" is the kernel NCU profile. Figure "8" is the pipeline bubbles for the different
7+
configurations.
8+
9+
We provide the source code of LoRAFusion and scripts to reproduce the major experimental results from the paper.
10+
This appendix shows how to generate the plots in Figure 0 (data distributions), Figure 1 (end-to-end results), Figure 5 (kernel performance), Figure 6 (layer-wise performance), and Figure 7 (memory traffic reduction).
11+
We provide installation instructions and scripts to set up the environment.
12+
To reproduce results, you need at least 192 GB RAM, 256 GB disk space, and 4 NVIDIA H100 GPUs.
13+
14+
## Description & Requirements
15+
16+
### Hardware dependencies
17+
You need a Linux machine with at least 192 GB RAM, 256 GB free disk space, and 4 NVIDIA H100 GPUs with NVLinks.
18+
19+
### Software dependencies
20+
You need Conda to set up the environment. The environment includes CUDA 12.6, PyTorch v2.6.0, megatron-core v0.11.0, and Triton v3.2.0.
21+
22+
### Benchmarks
23+
None
24+
25+
## Setup
26+
27+
1. **Clone the GitHub repository:**
28+
```bash
29+
git clone https://github.com/CentML/lorafusion.git
30+
git checkout eurosys-ae
31+
cd lorafusion
32+
```
33+
34+
2. **Install the requirements by running this command or following `../docs/installation.md`:**
35+
```bash
36+
conda create -y -n lorafusion python=3.12
37+
conda activate lorafusion
38+
cd benchmarks_paper
39+
bash scripts/setup/setup_env.sh
40+
```
41+
42+
3. **Download the Hugging Face models and datasets. Make sure you are logged in and have access to them:**
43+
```bash
44+
# huggingface-cli login
45+
python prepare_models.py
46+
python gen_sample_distribution.py
47+
```
48+
49+
## Evaluation Workflow
50+
51+
### Major Claims
52+
53+
- **(C1)**: LoRAFusion is up to 1.96× faster (average 1.47×) than Megatron-LM, and up to 1.46× faster (average 1.29×) than mLoRA. See Section 4.1 and Figure 1.
54+
55+
- **(C2)**: Our fused kernels are up to 1.39× faster (average 1.27×) and can replace existing LoRA kernels. See Section 4.2 and Figure 5, Figure 6, and Figure 7.
56+
57+
### Experiments
58+
59+
1. **Make sure you are in the `benchmarks_paper` directory.**
60+
61+
2. **Run the experiments:**
62+
```bash
63+
bash scripts/run_all.sh
64+
```
65+
66+
a. This runs all the main experiments and kernel performance tests. It takes about 4 hours.
67+
68+
b. Check `scripts/run_all.sh` for the exact commands and timing for each experiment.
69+
70+
c. You can easily modify it to run only some experiments.
71+
72+
3. **Check the results in the `results` directory. The script automatically creates plots like those in Figure 0, Figure 1, Figure 5, Figure 6, and Figure 7.**
73+
74+
## Notes on Reusability
75+
76+
To customize experiments, edit `scripts/run_all.sh` and the related sub-scripts.
77+
We provide detailed scripts for each experiment and corresponding Python scripts to generate the plots.

benchmarks_paper/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Benchmarks for the paper."""

0 commit comments

Comments
 (0)