https://github.com/facebookresearch/swe-rl
Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
https://github.com/facebookresearch/swe-rl
Last synced: 8 months ago
JSON representation
Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
- Host: GitHub
- URL: https://github.com/facebookresearch/swe-rl
- Owner: facebookresearch
- License: other
- Created: 2025-02-23T18:00:58.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-03-16T21:31:36.000Z (9 months ago)
- Last Synced: 2025-03-20T13:48:02.695Z (9 months ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2502.18449
- Size: 514 KB
- Stars: 462
- Watchers: 8
- Forks: 34
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- Awesome-RL-for-LRMs - facebookresearch/swe-rl
- StarryDivineSky - facebookresearch/swe-rl - RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution》的官方代码库,旨在通过强化学习技术提升大型语言模型(LLM)在开源软件演化任务中的推理能力。其核心创新在于将强化学习框架与开源软件的演变历史相结合,通过训练模型学习如何更有效地生成、修复和优化代码。项目特别关注软件开发中的实际问题,如代码补全、错误修复和版本迁移等,利用真实世界的开源项目数据作为训练和评估的基础。 SWE-RL的工作原理基于强化学习的奖励机制,通过模拟软件演化过程中的任务(如修复bug、重构代码)训练模型,使其在完成任务时获得更高的奖励,从而优化决策过程。模型通过分析大量开源代码的演变历史,学习如何生成符合语义且功能正确的代码,同时结合代码质量评估指标(如代码正确性、可读性)进行反馈优化。项目还提供了一个标准化的基准数据集,包含多个开源项目的演化轨迹和任务定义,用于验证模型在不同场景下的泛化能力。 该项目的关键特色包括:1)首次将强化学习应用于LLM的软件演化任务,突破传统监督学习的局限;2)引入动态任务环境,使模型能适应代码结构的复杂变化;3)提供完整的训练框架和评估工具,支持多种LLM架构的适配。实验表明,SWE-RL在代码生成和修复任务中显著优于基线模型,特别是在处理多步推理和上下文依赖的复杂任务时表现突出。此外,项目还开源了训练过程中的奖励函数设计和环境交互接口,为后续研究提供了可复用的基础设施。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
- AwesomeLLM4APR - 2025-NeurIPS
- awesome-deep-reasoning - SWE-RL - Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution (RelatedRepos / Advanced Reasoning for Coding)
README
# SWE-RL
🧐 About
| 🚀 Quick Start
| 🐣 Agentless Mini
| 📝 Citation
| 🙏 Acknowledgements
## 🧐 About
Official codebase for our paper: **SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution** ([link](https://arxiv.org/abs/2502.18449)).
**SWE-RL** is the first approach to scale reinforcement learning based LLM reasoning for real-world software engineering, leveraging open-source software evolution data and rule-based rewards.

> [!NOTE]
> We have undertaken significant code refactoring to enhance quality and accessibility. However, this may introduce potential inconsistencies with our internal implementation. If you encounter a bug, please file an issue. We are also gradually updating the repo to include additional information.
## 🚀 Quick start
```bash
git clone https://github.com/facebookresearch/swe-rl && cd swe-rl
pip install -e ".[dev]"
pytest
```
The code currently provides our prompt templates and the implementation of the reward function based on sequence similarity.
You can find them in [src/swerl/core/prompts.py](src/swerl/core/prompts.py) and [src/swerl/core/reward.py](src/swerl/core/reward.py) respectively.
We provide three reward function API:
1. `calculate_search_replace_reward`: calculates the similarity between search/replace changes and oracle changes (this's what we used in the paper);
2. `calculate_reward_unidiff`: calculates the similarity between two sets of unified diffs;
3. `calculate_reward`: a more general API that can be paired with any editing format.
### Reward for search/replace changes
A toy example on how you can use the reward function in your own project:
``````python
import swerl
file = """
def sort_list(lst):
return sorted(lst)
""".strip()
oracle_file = """
def sort_list(lst: list[int]) -> list[int]:
return sorted(lst)
""".strip()
context = {"example.py": file}
oracle = {"example.py": oracle_file}
output = """
...thoughts by LLM
```python
### example.py
<<<<<<< SEARCH
def sort_list(lst):
=======
def sort_list(lst: list[int]) -> list[int]:
>>>>>>> REPLACE
```
""".strip()
reward, metadata = swerl.core.reward.calculate_search_replace_reward(context, oracle, output)
assert reward == 1.0
print(metadata)
``````
### Reward for unified diff
Check `swerl.core.reward.calculate_reward_unidiff`. Here is the signature:
```python
def calculate_reward_unidiff(
oracle_patches: list[str], pred_patches: list[str]
) -> tuple[float, dict]:
"""
Compute the SWE-RL reward given two set of unified diffs.
The return value is always within the range of [0, 1].
Args:
oracle_patches: A list of oracle diffs.
pred_patches: A list of predicted diffs.
Returns:
A float value representing the reward, and a dictionary containing some metadata.
"""
```
### General version
Check `swerl.core.reward.calculate_reward`. Here is the signature:
```python
def calculate_reward(
code_context: dict[str, str],
oracle_new_content: dict[str, str],
pred_new_content: dict[str, str],
) -> tuple[float, dict]:
"""
Compute the SWE-RL reward given the code context, oracle patch, and the model output.
Note that this function is a general version of the reward calculation, which can be used
for code changes in any form, not just search/replace edits. For search/replace edits, use
`calculate_search_replace_reward`.
The return value is always within the range of [0, 1].
Args:
code_context: path -> original content of the file. It doesn't need to
contain the entire codebase, only the files that are affected by the oracle patch.
oracle_new_content: path -> oracle new content of the file after change.
pred_new_content: path -> predicted new content of the file after change.
Returns:
A float value representing the reward, and a dictionary containing some metadata.
"""
```
## 🐣 Agentless Mini

Agentless Mini builds on top of [Agentless](https://github.com/OpenAutoCoder/Agentless) with the following key improvements and functionality changes:
1. Fast async inference with [openai-python](https://github.com/openai/openai-python).
2. Code refactoring for better scalability, parallelization, and accessibility.
3. Only performing file-level localization, and entire file content will be used for repair.
4. Support of using multiple reproduction tests for reranking.
### Environment setup
To get started, run the following command to install the dependencies:
```bash
git clone https://github.com/facebookresearch/swe-rl && cd swe-rl
pip install -e ".[agentless]"
```
Agentless Mini works with any OpenAI-compatible endpoint.
If you want to host your own Hugging Face models, popular choices are [vLLM](https://docs.vllm.ai/en/latest/) and [SGLang](https://docs.sglang.ai/). Taking vLLM as an example:
```bash
# Host Llama-3.3-70B-Instruct with vLLM
pip install vllm
vllm serve meta-llama/Llama-3.3-70B-Instruct --tensor-parallel-size 4 --port 8000
# The endpointn url will be http://localhost:8000/v1
```
Finally, you would need to set up some environment variables required by Agentless Mini:
```bash
# Assume you're doing the above vLLM setup
# Otherwise, just adjust them accordingly
export OPENAI_API_KEY="Empty"
export OPENAI_BASE_URL="http://localhost:8000/v1"
# Whether "thinking" is in model output (yes/no). If so, we need to extract the answer block during parsing
# and ignore the thinking. We assume the answer is enclosed with "" and "".
# Check src/swerl/agentless_mini/utils/envs.py to learn how to adjust them.
export THINKING=no
# A temporary directory used to process patches
export PLAYGROUND_DIR="tmp_agentless"
# Please download it from https://github.com/OpenAutoCoder/Agentless/releases/download/v1.5.0/swebench_repo_structure.txt
export PROJECT_FILE_LOC="/path/to/swebench/repo_structures"
# The tokenizer model. Can be either huggingface or tiktoken model name
export TOKENIZER_MODEL="meta-llama/Llama-3.3-70B-Instruct"
```
### Repair with oracle files
Now you can run Agentless Mini if the environment variables are properly configured.
Below is the simplest setup where oracle files are provided for repair. This can be a good proxy for end to end result:
```bash
# Make sure your are in the root directory of swe-rl
#
# Agentless Mini supports sharding. If you are using a compute cluster, then you can run
# different shards with different compute nodes to parallelize the evaluation.
# Below, we set num_shards to 125, so each shard will have (500 / 125) instances, where
# 500 is the number of problems in SWE-bench Verified.
#
# NOTE: for SWE-bench Lite, please specify --dataset princeton-nlp/SWE-bench_Lite
python -m swerl.agentless_mini.repair \
--loc_file resources/sweb_verified_gt_loc.jsonl \
--output_folder demo_gt_repair \
--shard 0 \
--num_shards 125 \
--num_samples 1 \
--temperature 0.0 \
--model "meta-llama/Llama-3.3-70B-Instruct"
# Get your all_preds.jsonl
python -m swerl.agentless_mini.rerank \
--patch_folder ${REPAIR_DIR} \
--num_samples ${NUM_SAMPLES} \
--output_file demo_gt_repair/all_preds.jsonl \
--deduplicate
```
### Full pipeline
#### Localization + repair
You can also run the full pipeline. We show a greedy-decoding demo below:
```bash
NUM_SAMPLES=1
COMMON_ARGS=(
# NOTE: for SWE-bench Lite, please specify --dataset princeton-nlp/SWE-bench_Lite
--shard 0
--num_shards 125
--num_samples ${NUM_SAMPLES}
--temperature 0.0
--model "meta-llama/Llama-3.3-70B-Instruct"
# Check --max_concurrent_requests on how to control the concurrency
)
ROOT=demo_agentless
LOC_FILE=${ROOT}/loc.jsonl
REPAIR_DIR=${ROOT}/repair
PRED_FILE=${ROOT}/all_preds.jsonl
# Localization
python -m swerl.agentless_mini.localize \
--output_file ${LOC_FILE} \
${COMMON_ARGS[@]}
# Optionally, check localization performance
python -m swerl.agentless_mini.tools.check_loc_perf --locfile ${LOC_FILE}
# Repair
python -m swerl.agentless_mini.repair \
--loc_file ${LOC_FILE} \
--output_folder ${REPAIR_DIR} \
${COMMON_ARGS[@]}
# Rerank
python -m swerl.agentless_mini.rerank \
--patch_folder ${REPAIR_DIR} \
--num_samples ${NUM_SAMPLES} \
--output_file ${PRED_FILE} \
--deduplicate
# Now the ${PRED_FILE} will be ready. If you get all empty outputs, it means
# the model isn't generating correctly formatted edits. Then you should consider
# changing your base model or sampling more locations & repairs.
```
#### Reproduction test generation
> [!NOTE]
> Reproduction test generation, regression test selection, and test execution are WIP due to refactoring and infra difference.
> They will be updated shortly.
## 📝 Citation
```bibtex
@article{wei2025swerl,
title={SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution},
author={Yuxiang Wei and Olivier Duchenne and Jade Copet and Quentin Carbonneaux and Lingming Zhang and Daniel Fried and Gabriel Synnaeve and Rishabh Singh and Sida I. Wang},
year={2025},
journal={arXiv preprint arXiv:2502.18449}
}
```
## 🙏 Acknowledgements
[Agentless](https://github.com/OpenAutoCoder/Agentless),
[SWE-Gym](https://github.com/SWE-Gym/SWE-Gym),
[SWE-Fixer](https://github.com/InternLM/SWE-Fixer),
[SWE-bench](https://github.com/SWE-bench/SWE-bench),
[Moatless EvalTools](https://eval.moatless.ai/),
[Nebius SWE-agent](https://nebius.com/blog/posts/training-and-search-for-software-engineering-agents).
## License
The majority of SWE-RL is licensed under CC BY-NC 4.0, however portions of the project are available under separate license terms: [Agentless Mini](src/swerl/agentless_mini) is licensed under the MIT license.