This repository contains the code, dataset, and models from our paper "Fixing Performance Bugs Through LLM Explanations" (IEEE AITest 2025). We provide tools to extract performance bugs from Defects4J, fine-tune GPT-4o-mini for performance bug detection, and evaluate the results.
π View Project Website | π Interactive Presentation | π Paper (IEEE Xplore)
- Total Performance Bugs: 490
- Projects: 17 Defects4J projects
- Categories: 5 (Algorithmic, Memory, CPU, Redundant Computation, I/O)
- Training/Test Split: 392/98 (80/20)
- Dataset - The 490 performance bugs and per-category training/test splits
- Bug Extraction - Scripts to extract performance bugs from Defects4J projects
- Bug Categorization - Classifies bugs into the five performance categories
- Explanation Generation - Generates LLM-based explanations for each bug
- Model Training - Fine-tuning code for GPT-4o-mini
- Evaluation Framework - Metrics, benchmarks, and model comparison
- Performance Validation - Tools to validate that fixes improve performance
- Notebooks - Jupyter notebooks for analysis and visualization
- Conference Presentation - Interactive slides
- Installation Guide - Step-by-step setup instructions
- Usage Guide - How to use each component
- Dataset Description - Detailed dataset documentation
# Clone the repository
git clone https://github.com/SuryanshSS1011/Performance-Bugs-LLM.git
cd Performance-Bugs-LLM
# Set up the environment (creates venv and installs requirements)
./scripts/setup_environment.sh
# Run the end-to-end pipeline
python main.pydata/: The 490 performance bugs dataset, training splits, and evaluation reportsextraction/: Scripts for extracting performance bugs from Defects4Jcategorization/: Classifies bugs into the five performance categoriesexplanation/: Generates LLM-based bug explanationsprocessing/: Method-level code extractionmodels/: Fine-tuning code for GPT-4o-minievaluation/: Evaluation metrics, comparison framework, and reportsvalidation/: Code for validating performance improvementsnotebooks/: Jupyter notebooks for exploration and visualizationresults/visualizations/: Generated figures (Fig. 1β3 from the paper)scripts/: Setup script and figure generationdocs/: Installation, usage, and dataset documentationref/: Reference materials (technical guide; paper PDF gitignored)
- Python 3.8+
- Java 8
- Maven 3.6+
- Git
- At least 50GB free disk space
- Clone the repository:
git clone https://github.com/SuryanshSS1011/performance-bugs-llm.git
cd performance-bugs-llm- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env and add your OpenAI API key- Install Defects4J (only needed if reproducing bug extraction). Follow the
official setup at https://github.com/rjust/defects4j and ensure the
defects4jcommand is on yourPATH.
The dataset is provided as JSON:
import json
from collections import Counter
with open('data/performance_bugs_490.json', 'r') as f:
bugs = json.load(f)
print(f"Total bugs: {len(bugs)}")
print("Category distribution:", Counter(b['category'] for b in bugs))Per-category training splits live in data/training/ as JSONL files
(train_algorithmic_inefficiency.jsonl, etc.) plus train_combined.jsonl and
test_combined.jsonl.
The end-to-end pipeline is orchestrated by main.py:
python main.pyIndividual stages can also be run via their modules:
- Extraction:
python -m extraction.defects4j_extractor - Categorization:
python -m categorization.bug_categorizer - Explanation generation:
python -m explanation.explanation_generator - Fine-tuning:
python -m models.fine_tuning_executor - Evaluation:
python -m evaluation.comprehensive_evaluator - Performance validation:
python -m validation.performance_tester
The three figures from the paper (Fig. 1 β category distribution, Fig. 2 β per-project breakdown, Fig. 3 β per-category P/R/F1) can be regenerated from the published numbers:
python scripts/generate_paper_figures.pyOutputs are written to results/visualizations/.
- Installation Guide - Detailed setup instructions
- Usage Guide - How to use each component
- Dataset Description - Detailed dataset documentation
- Technical Guide - Implementation notes and design decisions
| Metric | Base Model | Fine-tuned Model |
|---|---|---|
| Accuracy | 67.3% | 83.7% |
| Precision | 65.1% | 83.0% |
| Recall | 64.2% | 81.8% |
| F1 Score | 64.6% | 82.3% |
| Category | Count | Percentage |
|---|---|---|
| Algorithmic Inefficiency | 165 | 33.7% |
| Memory Usage | 116 | 23.7% |
| CPU Overhead | 99 | 20.2% |
| Redundant Computation | 54 | 11.0% |
| I/O Inefficiency | 56 | 11.4% |
If you use this work, please cite both the paper and the archived dataset/code:
Paper
@inproceedings{sijwali2025fixing,
title={Fixing Performance Bugs Through LLM Explanations},
author={Sijwali, Suryansh Singh and Colom, Angela Marie and Guo, Anbi and Saha, Suman},
booktitle={2025 IEEE International Conference on Artificial Intelligence Testing (AITest)},
year={2025},
pages={102--109},
doi={10.1109/AITest66680.2025.00020}
}Dataset and code (Zenodo)
@software{sijwali2025fixing_artifact,
title={Performance-Bugs-LLM: Dataset and Code for "Fixing Performance Bugs Through LLM Explanations"},
author={Sijwali, Suryansh Singh and Colom, Angela Marie and Guo, Anbi and Saha, Suman},
year={2025},
publisher={Zenodo},
doi={10.5281/zenodo.20113202},
url={https://doi.org/10.5281/zenodo.20113202}
}This project is licensed under the MIT License - see the LICENSE file for details.
- The Defects4J team for providing the bug dataset
- OpenAI for GPT-4o-mini access
- All contributors and reviewers
For questions or issues, please:
- Open an issue on GitHub
- Contact the authors through the paper