CentML
diff --git a/‎.gitignore‎
Lines changed: 26 additions & 51 deletions b/‎.gitignore‎
Lines changed: 26 additions & 51 deletions
diff --git a/‎.hadolint.yaml‎
Lines changed: 4 additions & 0 deletions b/‎.hadolint.yaml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 51 additions & 0 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 51 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 14 additions & 1 deletion b/‎README.md‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎benchmarks_paper/README.md‎
Lines changed: 77 additions & 0 deletions b/‎benchmarks_paper/README.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎benchmarks_paper/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎benchmarks_paper/__init__.py‎
Lines changed: 1 addition & 0 deletions
@@ -1,6 +1,26 @@
+.vscode/
+*.pt.trace.json
+*.hatchet
+tmps/
+results/
+unsloth_compiled_cache/
+benchmarks/llama-factory/LLaMA-Factory/data/
+benchmarks/llama-factory/LLaMA-Factory/
+benchmarks/llama-factory/results*
+benchmarks/llama-factory/data/
+benchmarks/peft/datasets/
+benchmarks/peft/llama3-8b-gsm8k-lora/
+lorafusion/simulator/_*.py
+benchmarks/DoRA/
+
+# NPM
+node_modules
+package-lock.json
+/package.json
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
-*.py[codz]
+*.py[cod]
 *$py.class
 
 # C extensions
@@ -46,7 +66,7 @@ htmlcov/
 nosetests.xml
 coverage.xml
 *.cover
-*.py.cover
+*.py,cover
 .hypothesis/
 .pytest_cache/
 cover/
@@ -94,36 +114,23 @@ ipython_config.py
 #   install all needed dependencies.
 #Pipfile.lock
 
-# UV
-#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#uv.lock
-
 # poetry
 #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
 #poetry.lock
-#poetry.toml
 
 # pdm
 #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
-#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control
 #pdm.lock
-#pdm.toml
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
+.pdm.toml
 .pdm-python
 .pdm-build/
 
-# pixi
-#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
-#pixi.lock
-#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
-#   in the .venv directory. It is recommended not to include this directory in version control.
-.pixi
-
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
 __pypackages__/
 
@@ -136,7 +143,6 @@ celerybeat.pid
 
 # Environments
 .env
-.envrc
 .venv
 env/
 venv/
@@ -174,34 +180,3 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
-
-# Abstra
-# Abstra is an AI-powered process automation framework.
-# Ignore directories containing user credentials, local state, and settings.
-# Learn more at https://abstra.io/docs
-.abstra/
-
-# Visual Studio Code
-#  Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore 
-#  that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
-#  and can be added to the global gitignore or merged into this file. However, if you prefer, 
-#  you could uncomment the following to ignore the entire vscode folder
-# .vscode/
-
-# Ruff stuff:
-.ruff_cache/
-
-# PyPI configuration file
-.pypirc
-
-# Cursor
-#  Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
-#  exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
-#  refer to https://docs.cursor.com/context/ignore-files
-.cursorignore
-.cursorindexingignore
-
-# Marimo
-marimo/_static/
-marimo/_lsp/
-__marimo__/
@@ -0,0 +1,4 @@
+ignored:
+  - DL3006 # Always tag the version of an image explicitly.
+  - DL3013 # Pin versions in pip.
+  - DL3008 # Pin versions in apt get install.
@@ -0,0 +1,51 @@
+exclude: |
+  (?x)(
+    ^playgrounds/|
+    ^benchmarks/|
+    ^\.git/|
+    ^\.ruff_cache/|
+    ^loomtrain\.egg-info/|
+    \.pt\.trace\.json$|
+    \.txt$
+  )
+repos:
+  - repo: https://github.com/psf/black
+    rev: 25.1.0
+    hooks:
+      - id: black
+        language_version: python3
+        args: [--line-length=88]
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.12.3
+    hooks:
+      - id: ruff
+        args: [--fix, --exit-non-zero-on-fix]
+      - id: ruff-format
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+      - id: check-added-large-files
+        args: [--maxkb=1000]
+      - id: check-json
+      - id: check-yaml
+      - id: check-toml
+      - id: end-of-file-fixer
+      - id: trailing-whitespace
+  - repo: https://github.com/scop/pre-commit-shfmt
+    rev: v3.12.0-1
+    hooks:
+      - id: shfmt
+        args:
+          - --indent=2
+          - --write
+          - --simplify
+  - repo: https://github.com/shellcheck-py/shellcheck-py
+    rev: v0.10.0.1
+    hooks:
+      - id: shellcheck
+  - repo: https://github.com/hadolint/hadolint
+    rev: v2.12.0
+    hooks:
+      - id: hadolint-docker
+        name: dockerfile-lint
+        types_or: [dockerfile]
@@ -1 +1,14 @@
-# lorafusion
+# LoRAFusion
+
+LoRAFusion: Efficient LoRA Fine-Tuning for LLMs
+
+[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/)
+[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
+
+## Guides and Documentation
+
+- General Guides
+  - [Installation](./docs/installation.md)
+  - [Development](./docs/development.md)
+- Artifact Evaluation
+  - [Evaluation Instructions](./benchmarks_paper/README.md)
@@ -0,0 +1,77 @@
+# LoRAFusion Artifact Evaluation
+
+> Note: Here the figure numbers can be a bit confusing, we use the "0" to represent the data distribution plot, and "1" is the 
+main end-to-end plot in the evaluation section. Figure "2" and "3" are the two subfigures for L40 GPU evaluation. Figure "4" 
+is the scaling of the different methods. Figure "5" is the kernel performance forward/backward. Figure "6" is the layer 
+performance normalized. Figure "7" is the kernel NCU profile. Figure "8" is the pipeline bubbles for the different 
+configurations.
+
+We provide the source code of LoRAFusion and scripts to reproduce the major experimental results from the paper.
+This appendix shows how to generate the plots in Figure 0 (data distributions), Figure 1 (end-to-end results), Figure 5 (kernel performance), Figure 6 (layer-wise performance), and Figure 7 (memory traffic reduction).
+We provide installation instructions and scripts to set up the environment.
+To reproduce results, you need at least 192 GB RAM, 256 GB disk space, and 4 NVIDIA H100 GPUs.
+
+## Description & Requirements
+
+### Hardware dependencies
+You need a Linux machine with at least 192 GB RAM, 256 GB free disk space, and 4 NVIDIA H100 GPUs with NVLinks.
+
+### Software dependencies
+You need Conda to set up the environment. The environment includes CUDA 12.6, PyTorch v2.6.0, megatron-core v0.11.0, and Triton v3.2.0.
+
+### Benchmarks
+None
+
+## Setup
+
+1. **Clone the GitHub repository:**
+   ```bash
+   git clone https://github.com/CentML/lorafusion.git
+   git checkout eurosys-ae
+   cd lorafusion
+   ```
+
+2. **Install the requirements by running this command or following `../docs/installation.md`:**
+   ```bash
+   conda create -y -n lorafusion python=3.12
+   conda activate lorafusion
+   cd benchmarks_paper
+   bash scripts/setup/setup_env.sh
+   ```
+
+3. **Download the Hugging Face models and datasets. Make sure you are logged in and have access to them:**
+   ```bash
+   # huggingface-cli login
+   python prepare_models.py
+   python gen_sample_distribution.py
+   ```
+
+## Evaluation Workflow
+
+### Major Claims
+
+- **(C1)**: LoRAFusion is up to 1.96× faster (average 1.47×) than Megatron-LM, and up to 1.46× faster (average 1.29×) than mLoRA. See Section 4.1 and Figure 1.
+
+- **(C2)**: Our fused kernels are up to 1.39× faster (average 1.27×) and can replace existing LoRA kernels. See Section 4.2 and Figure 5, Figure 6, and Figure 7.
+
+### Experiments
+
+1. **Make sure you are in the `benchmarks_paper` directory.**
+
+2. **Run the experiments:**
+   ```bash
+   bash scripts/run_all.sh
+   ```
+   
+   a. This runs all the main experiments and kernel performance tests. It takes about 4 hours.
+   
+   b. Check `scripts/run_all.sh` for the exact commands and timing for each experiment.
+   
+   c. You can easily modify it to run only some experiments.
+
+3. **Check the results in the `results` directory. The script automatically creates plots like those in Figure 0, Figure 1, Figure 5, Figure 6, and Figure 7.**
+
+## Notes on Reusability
+
+To customize experiments, edit `scripts/run_all.sh` and the related sub-scripts.
+We provide detailed scripts for each experiment and corresponding Python scripts to generate the plots.
@@ -0,0 +1 @@
+"""Benchmarks for the paper."""