{"id":28961742,"url":"https://github.com/getappmap/swe-bench","last_synced_at":"2025-06-24T02:04:50.750Z","repository":{"id":244853138,"uuid":"801525507","full_name":"getappmap/SWE-bench","owner":"getappmap","description":"Our fork of SWE-bench","archived":false,"fork":false,"pushed_at":"2024-08-19T22:12:37.000Z","size":2759,"stargazers_count":0,"open_issues_count":7,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-08-20T02:19:49.113Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getappmap.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-16T11:59:02.000Z","updated_at":"2024-08-19T22:12:40.000Z","dependencies_parsed_at":"2024-08-01T18:18:57.533Z","dependency_job_id":"ee34233b-e8a1-4873-8213-df51f5f87678","html_url":"https://github.com/getappmap/SWE-bench","commit_stats":null,"previous_names":["getappmap/swe-bench"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/getappmap/SWE-bench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2FSWE-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2FSWE-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2FSWE-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2FSWE-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getappmap","download_url":"https://codeload.github.com/getappmap/SWE-bench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2FSWE-bench/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261589912,"owners_count":23181437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-24T02:04:49.807Z","updated_at":"2025-06-24T02:04:50.672Z","avatar_url":"https://github.com/getappmap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Setup \n\n```\n$ conda env create --name swe-bench --file environment.yml\n$ conda activate swe-bench\n$ mkdir appmaps appmap_logs /tmp/swe-appmaps\n$ ./appmap/make_appmaps.sh\n```\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/princeton-nlp/Llamao\"\u003e\n    \u003cimg src=\"assets/swellama_banner.png\" width=\"50%\" alt=\"Kawi the SWE-Llama\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n | [日本語](docs/README_JP.md) | [English](https://github.com/princeton-nlp/SWE-bench) | [中文简体](docs/README_CN.md) | [中文繁體](docs/README_TW.md) |\n\n\u003c/div\u003e\n\n\n---\n\u003cp align=\"center\"\u003e\nCode and data for our ICLR 2024 paper \u003ca href=\"http://swe-bench.github.io/paper.pdf\"\u003eSWE-bench: Can Language Models Resolve Real-World GitHub Issues?\u003c/a\u003e\n    \u003c/br\u003e\n    \u003c/br\u003e\n    \u003ca href=\"https://www.python.org/\"\u003e\n        \u003cimg alt=\"Build\" src=\"https://img.shields.io/badge/Python-3.8+-1f425f.svg?color=purple\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://copyright.princeton.edu/policy\"\u003e\n        \u003cimg alt=\"License\" src=\"https://img.shields.io/badge/License-MIT-blue\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://badge.fury.io/py/swebench\"\u003e\n        \u003cimg src=\"https://badge.fury.io/py/swebench.svg\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\nPlease refer our [website](http://swe-bench.github.io) for the public leaderboard and the [change log](https://github.com/princeton-nlp/SWE-bench/blob/main/CHANGELOG.md) for information on the latest updates to the SWE-bench benchmark.\n\n## 📰 News\n* **[Apr. 15, 2024]**: SWE-bench has gone through major improvements to resolve issues with the evaluation harness. Read more in our [report](https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240415_eval_bug/README.md).\n* **[Apr. 2, 2024]**: We have released [SWE-agent](https://github.com/princeton-nlp/SWE-agent), which sets the state-of-the-art on the full SWE-bench test set! ([Tweet 🔗](https://twitter.com/jyangballin/status/1775114444370051582))\n* **[Jan. 16, 2024]**: SWE-bench has been accepted to ICLR 2024 as an oral presentation! ([OpenReview 🔗](https://openreview.net/forum?id=VTF8yNQM66))\n\n## 👋 Overview\nSWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub.\nGiven a *codebase* and an *issue*, a language model is tasked with generating a *patch* that resolves the described problem.\n\n\u003cimg src=\"assets/teaser.png\"\u003e\n\n## 🚀 Set Up\nTo build SWE-bench from source, follow these steps:\n1. Clone this repository locally\n2. `cd` into the repository.\n3. Run `conda env create -f environment.yml` to created a conda environment named `swe-bench`\n4. Activate the environment with `conda activate swe-bench`\n\n## 💽 Usage\nYou can download the SWE-bench dataset directly ([dev](https://drive.google.com/uc?export=download\u0026id=1SbOxHiR0eXlq2azPSSOIDZz-Hva0ETpX), [test](https://drive.google.com/uc?export=download\u0026id=164g55i3_B78F6EphCZGtgSrd2GneFyRM) sets) or from [HuggingFace](https://huggingface.co/datasets/princeton-nlp/SWE-bench).\n\nTo use SWE-Bench, you can:\n* Train your own models on our pre-processed datasets  \n* Run [inference](https://github.com/princeton-nlp/SWE-bench/blob/main/inference/) on existing models (either models you have on-disk like LLaMA, or models you have access to through an API like GPT-4). The inference step is where you get a repo and an issue and have the model try to generate a fix for it.\n* [Evaluate](https://github.com/princeton-nlp/SWE-bench/blob/main/swebench/harness/) models against SWE-bench. This is where you take a SWE-Bench task and a model-proposed solution and evaluate its correctness. \n*  Run SWE-bench's [data collection procedure](https://github.com/princeton-nlp/SWE-bench/blob/main/swebench/collect/) on your own repositories, to make new SWE-Bench tasks. \n\n## ⬇️ Downloads\n| Datasets | Models |\n| - | - |\n| [🤗 SWE-bench](https://huggingface.co/datasets/princeton-nlp/SWE-bench) | [🦙 SWE-Llama 13b](https://huggingface.co/princeton-nlp/SWE-Llama-13b) |\n| [🤗 \"Oracle\" Retrieval](https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle) | [🦙 SWE-Llama 13b (PEFT)](https://huggingface.co/princeton-nlp/SWE-Llama-13b-peft) |\n| [🤗 BM25 Retrieval 13K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_13K) | [🦙 SWE-Llama 7b](https://huggingface.co/princeton-nlp/SWE-Llama-7b) |\n| [🤗 BM25 Retrieval 27K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_27K) | [🦙 SWE-Llama 7b (PEFT)](https://huggingface.co/princeton-nlp/SWE-Llama-7b-peft) |\n| [🤗 BM25 Retrieval 40K](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_40K) | |\n| [🤗 BM25 Retrieval 50K (Llama tokens)](https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_50k_llama)   | |\n\n## 🍎 Tutorials\nWe've also written the following blog posts on how to use different parts of SWE-bench.\nIf you'd like to see a post about a particular topic, please let us know via an issue.\n* [Nov 1. 2023] Collecting Evaluation Tasks for SWE-Bench ([🔗](https://github.com/princeton-nlp/SWE-bench/tree/main/tutorials/collection.md))\n* [Nov 6. 2023] Evaluating on SWE-bench ([🔗](https://github.com/princeton-nlp/SWE-bench/tree/main/tutorials/evaluation.md))\n\n## 💫 Contributions\nWe would love to hear from the broader NLP, Machine Learning, and Software Engineering research communities, and we welcome any contributions, pull requests, or issues!\nTo do so, please either file a new pull request or issue and fill in the corresponding templates accordingly. We'll be sure to follow up shortly!\n\nContact person: [Carlos E. Jimenez](http://www.carlosejimenez.com/) and [John Yang](https://john-b-yang.github.io/) (Email: {carlosej, jy1682}@princeton.edu).\n\n## ✍️ Citation\nIf you find our work helpful, please use the following citations.\n```\n@inproceedings{\n    jimenez2024swebench,\n    title={{SWE}-bench: Can Language Models Resolve Real-world Github Issues?},\n    author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan},\n    booktitle={The Twelfth International Conference on Learning Representations},\n    year={2024},\n    url={https://openreview.net/forum?id=VTF8yNQM66}\n}\n```\n\n## 🪪 License\nMIT. Check `LICENSE.md`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetappmap%2Fswe-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetappmap%2Fswe-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetappmap%2Fswe-bench/lists"}