GitHub - multi-swe-bench/MSWE-agent

👋 Hi, everyone!
We are ByteDance Seed team.

You can get to know us better through the following channels👇

🚀 MSWE-agent: Swe-agent for multi-swe-bench

We have modified the original Swe-agent (2024.07 version) compatible with Multi-SWE-Bench! MSwe-agent can be used to evaluate the performance of LLMs across 7 languages(c++, c, java, go, rust, typescript, javascript) in the Multi-SWE-Bench dataset.

To Start

1. Install

conda env create -f environment.yml
conda activate mswe-agent

2. Download Dataset

sudo chmod +x preprocess_data.sh
./preprocess_data.sh

3. Run

configure the keys.cfg to make sure your OpenAI API key is set correctly. We provide two ways to run the agent on multi-swe-bench:

3.1 Run serially

python3 run.py \
   --model_name gpt4o \
   --cache_task_images True \
   --per_instance_api_calls_limit 50 \
   --pre_build_all_images True \ # if you want to build all the images, set it to True, otherwise build case after case
   --remove_image False \ # if you want to remove the image after running, set it to True, otherwise keep it
   --pr_file data/go_verified.jsonl \ # the language you want to run
   --config_file config/default.yaml  --skip_existing=True \
   --per_instance_cost_limit 5.00 \
   --print_config=False \
   --max_workers_build_image 16

3.2 Run in parallel

export RUNNING_THREADS=30
python3 multirun.py \
   --model_name gpt4o \
   --cache_task_images True \
   --per_instance_api_calls_limit 50 \
   --pre_build_all_images True \
   --remove_image False \
   --pr_file data/go.jsonl \
   --config_file config/default.yaml  --skip_existing=True \
   --per_instance_cost_limit 5.00 \
   --print_config=False \
   --max_workers_build_image 16

Images

We provide the images for each instance. You can use the following command to download the images directly from our docker hub site rather than build them locally.

📊 Evaluation

after running the agent, all the predicted patches will be save in trajactories directory, named as all_preds.jsonl. And then you can evaluate in the multi-swe-bench repo

Run Evaluation

To run the evaluation, you need to prepare the following:

Patch Files: Some patch files in JSONL format, each item containing:
- org: Organization Name
- repo: Repository Name
- number: Pull Request Number
- fix_patch: Fix Patch Content
Dataset Files: Dataset files in JSONL format available on Hugging Face, such as Multi-SWE-Bench

Then you can run the evaluation using the following command:

cd multi-swe-bench
python -m multi_swe_bench.harness.run_evaluation --config /path/to/your/config.json

Configuration File Example

{
    "mode": "evaluation",
    "workdir": "./data/workdir",
    "patch_files": [
        "./data/patches/<your_patch_file>.jsonl"
    ],
    "dataset_files": [
        "./data/patches/<to_evaluate_dataset_file>.jsonl"
    ],
    "force_build": false,
    "output_dir": "./data/dataset",
    "specifics": [],
    "skips": [],
    "repo_dir": "./data/repos",
    "need_clone": false,
    "global_env": [],
    "clear_env": true,
    "stop_on_error": true,
    "max_workers": 8,
    "max_workers_build_image": 8,
    "max_workers_run_instance": 8,
    "log_dir": "./data/logs",
    "log_level": "DEBUG"
}

Configuration Parameters

Parameter	Description
`mode`	Execution mode for the script. Options: `"evaluation"`, `"instance"`, `"instance_only"`, `"image"`. Default: `"evaluation"`
`workdir`	Working directory path for evaluation operations
`patch_files`	List of patch file paths in JSONL format (supports glob patterns)
`dataset_files`	List of dataset file paths in JSONL format (supports glob patterns)
`force_build`	Whether to force rebuild Docker images even if they already exist
`output_dir`	Directory path for output results
`specifics`	List of specific PR IDs to evaluate (empty = all)
`skips`	List of PR IDs to skip during evaluation
`repo_dir`	Directory containing cloned repositories
`need_clone`	Whether repositories should be cloned if not present
`global_env`	Global environment variables to pass to Docker containers (format: `"KEY=VALUE"`)
`clear_env`	Whether to clear environment variables in Docker containers
`stop_on_error`	Whether to stop execution when an error occurs
`max_workers`	Maximum number of concurrent worker threads for general tasks
`max_workers_build_image`	Maximum number of concurrent worker threads for building Docker images
`max_workers_run_instance`	Maximum number of concurrent worker threads for running instances
`log_dir`	Directory for log files
`log_level`	Logging level. Options: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`

📜 License

This project is licensed under Apache License 2.0. See the LICENSE flie for details.

📖 Citation

If you find XXX useful for your research and applications, feel free to give us a star ⭐ or cite us using:

@article{zan2024swe,
  title={Swe-bench-java: A github issue resolving benchmark for java},
  author={Zan, Daoguang and Huang, Zhirong and Yu, Ailun and Lin, Shaoxin and Shi, Yifan and Liu, Wei and Chen, Dong and Qi, Zongshuai and Yu, Hao and Yu, Lei and others},
  journal={arXiv preprint arXiv:2408.14354},
  year={2024}
}

🏢 About ByteDance Seed Team

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
.github		.github
config		config
docker		docker
docs		docs
evaluation		evaluation
make_demos		make_demos
multi_swe_bench		multi_swe_bench
sweagent		sweagent
tests		tests
trajectories		trajectories
.dockerignore		.dockerignore
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
environment.yml		environment.yml
multirun.py		multirun.py
preprocess_data.sh		preprocess_data.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh
run_replay.py		run_replay.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 MSWE-agent: Swe-agent for multi-swe-bench

To Start

1. Install

2. Download Dataset

3. Run

3.1 Run serially

3.2 Run in parallel

Images

📊 Evaluation

Run Evaluation

Configuration File Example

Configuration Parameters

📜 License

📖 Citation

🏢 About ByteDance Seed Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

multi-swe-bench/MSWE-agent

Folders and files

Latest commit

History

Repository files navigation

🚀 MSWE-agent: Swe-agent for multi-swe-bench

To Start

1. Install

2. Download Dataset

3. Run

3.1 Run serially

3.2 Run in parallel

Images

📊 Evaluation

Run Evaluation

Configuration File Example

Configuration Parameters

📜 License

📖 Citation

🏢 About ByteDance Seed Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages