Skip to content

SuryanshSS1011/Performance-Bugs-LLM

Repository files navigation

Performance Bugs Through LLM Explanations

DOI License: MIT Python 3.8+ Defects4J Paper

This repository contains the code, dataset, and models from our paper "Fixing Performance Bugs Through LLM Explanations" (IEEE AITest 2025). We provide tools to extract performance bugs from Defects4J, fine-tune GPT-4o-mini for performance bug detection, and evaluate the results.

🌐 View Project Website | πŸ“Š Interactive Presentation | πŸ“„ Paper (IEEE Xplore)

πŸ“Š Dataset Statistics

  • Total Performance Bugs: 490
  • Projects: 17 Defects4J projects
  • Categories: 5 (Algorithmic, Memory, CPU, Redundant Computation, I/O)
  • Training/Test Split: 392/98 (80/20)

πŸ“‘ Documentation Index

Core Components

Detailed Guides

πŸš€ Quick Start

# Clone the repository
git clone https://github.com/SuryanshSS1011/Performance-Bugs-LLM.git
cd Performance-Bugs-LLM

# Set up the environment (creates venv and installs requirements)
./scripts/setup_environment.sh

# Run the end-to-end pipeline
python main.py

πŸ“ Repository Structure

  • data/: The 490 performance bugs dataset, training splits, and evaluation reports
  • extraction/: Scripts for extracting performance bugs from Defects4J
  • categorization/: Classifies bugs into the five performance categories
  • explanation/: Generates LLM-based bug explanations
  • processing/: Method-level code extraction
  • models/: Fine-tuning code for GPT-4o-mini
  • evaluation/: Evaluation metrics, comparison framework, and reports
  • validation/: Code for validating performance improvements
  • notebooks/: Jupyter notebooks for exploration and visualization
  • results/visualizations/: Generated figures (Fig. 1–3 from the paper)
  • scripts/: Setup script and figure generation
  • docs/: Installation, usage, and dataset documentation
  • ref/: Reference materials (technical guide; paper PDF gitignored)

πŸ”§ Installation

Prerequisites

  • Python 3.8+
  • Java 8
  • Maven 3.6+
  • Git
  • At least 50GB free disk space

Setup

  1. Clone the repository:
git clone https://github.com/SuryanshSS1011/performance-bugs-llm.git
cd performance-bugs-llm
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
cp .env.example .env
# Edit .env and add your OpenAI API key
  1. Install Defects4J (only needed if reproducing bug extraction). Follow the official setup at https://github.com/rjust/defects4j and ensure the defects4j command is on your PATH.

πŸ“Š Using the Dataset

The dataset is provided as JSON:

import json
from collections import Counter

with open('data/performance_bugs_490.json', 'r') as f:
    bugs = json.load(f)

print(f"Total bugs: {len(bugs)}")
print("Category distribution:", Counter(b['category'] for b in bugs))

Per-category training splits live in data/training/ as JSONL files (train_algorithmic_inefficiency.jsonl, etc.) plus train_combined.jsonl and test_combined.jsonl.

πŸ€– Reproducing the Pipeline

The end-to-end pipeline is orchestrated by main.py:

python main.py

Individual stages can also be run via their modules:

  • Extraction: python -m extraction.defects4j_extractor
  • Categorization: python -m categorization.bug_categorizer
  • Explanation generation: python -m explanation.explanation_generator
  • Fine-tuning: python -m models.fine_tuning_executor
  • Evaluation: python -m evaluation.comprehensive_evaluator
  • Performance validation: python -m validation.performance_tester

πŸ“ˆ Regenerating Paper Figures

The three figures from the paper (Fig. 1 β€” category distribution, Fig. 2 β€” per-project breakdown, Fig. 3 β€” per-category P/R/F1) can be regenerated from the published numbers:

python scripts/generate_paper_figures.py

Outputs are written to results/visualizations/.

πŸ“š Documentation

πŸ“Š Key Results

Metric Base Model Fine-tuned Model
Accuracy 67.3% 83.7%
Precision 65.1% 83.0%
Recall 64.2% 81.8%
F1 Score 64.6% 82.3%

πŸ” Category Distribution

Category Count Percentage
Algorithmic Inefficiency 165 33.7%
Memory Usage 116 23.7%
CPU Overhead 99 20.2%
Redundant Computation 54 11.0%
I/O Inefficiency 56 11.4%

πŸ“– Citation

If you use this work, please cite both the paper and the archived dataset/code:

Paper

@inproceedings{sijwali2025fixing,
  title={Fixing Performance Bugs Through LLM Explanations},
  author={Sijwali, Suryansh Singh and Colom, Angela Marie and Guo, Anbi and Saha, Suman},
  booktitle={2025 IEEE International Conference on Artificial Intelligence Testing (AITest)},
  year={2025},
  pages={102--109},
  doi={10.1109/AITest66680.2025.00020}
}

Dataset and code (Zenodo)

@software{sijwali2025fixing_artifact,
  title={Performance-Bugs-LLM: Dataset and Code for "Fixing Performance Bugs Through LLM Explanations"},
  author={Sijwali, Suryansh Singh and Colom, Angela Marie and Guo, Anbi and Saha, Suman},
  year={2025},
  publisher={Zenodo},
  doi={10.5281/zenodo.20113202},
  url={https://doi.org/10.5281/zenodo.20113202}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • The Defects4J team for providing the bug dataset
  • OpenAI for GPT-4o-mini access
  • All contributors and reviewers

πŸ“§ Contact

For questions or issues, please:

  • Open an issue on GitHub
  • Contact the authors through the paper

πŸ”— Links

About

Detecting and explaining Java performance bugs with a fine-tuned LLM. Dataset of 490 bugs, code, and evaluation. IEEE AITest 2025.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors