{"id":28017884,"url":"https://github.com/multi-swe-bench/MopenHands","last_synced_at":"2025-05-10T12:01:43.978Z","repository":{"id":285764234,"uuid":"956361729","full_name":"multi-swe-bench/MopenHands","owner":"multi-swe-bench","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-02T14:18:34.000Z","size":30537,"stargazers_count":0,"open_issues_count":4,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-02T15:27:42.855Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/multi-swe-bench.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-28T05:58:53.000Z","updated_at":"2025-04-02T14:18:37.000Z","dependencies_parsed_at":"2025-04-02T15:38:58.796Z","dependency_job_id":null,"html_url":"https://github.com/multi-swe-bench/MopenHands","commit_stats":null,"previous_names":["multi-swe-bench/mopenhands"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multi-swe-bench%2FMopenHands","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multi-swe-bench%2FMopenHands/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multi-swe-bench%2FMopenHands/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multi-swe-bench%2FMopenHands/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/multi-swe-bench","download_url":"https://codeload.github.com/multi-swe-bench/MopenHands/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253411522,"owners_count":21904147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-10T12:01:39.801Z","updated_at":"2025-05-10T12:01:43.972Z","avatar_url":"https://github.com/multi-swe-bench.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n 👋 Hi, everyone! \n    \u003cbr\u003e\n    We are \u003cb\u003eByteDance Seed team.\u003c/b\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n  You can get to know us better through the following channels👇\n  \u003cbr\u003e\n  \u003ca href=\"https://team.doubao.com/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge\u0026logo=bytedance\u0026logoColor=white\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/user-attachments/assets/93481cda-a7f3-47f3-b333-fe6b3da86b78\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/WeChat-07C160?style=for-the-badge\u0026logo=wechat\u0026logoColor=white\"\u003e\u003c/a\u003e\n \u003ca href=\"https://www.xiaohongshu.com/user/profile/668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D\u0026xsec_source=pc_search\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Xiaohongshu-%23FF2442?style=for-the-badge\u0026logo=xiaohongshu\u0026logoColor=white\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.zhihu.com/org/dou-bao-da-mo-xing-tuan-dui/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/zhihu-%230084FF?style=for-the-badge\u0026logo=zhihu\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n![seed logo](https://github.com/user-attachments/assets/c42e675e-497c-4508-8bb9-093ad4d1f216)\n\n\n## 🚀 Mopenhands: Multi-SWE-Bench Infer with OpenHands\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/multi-swe-bench/multi-swe-bench\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Multi_SWE_bench-Project Page-yellow\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://arxiv.org/pdf/2502.19811\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Multi_SWE_bench-Tech Report-red\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Multi_SWE_bench-Hugging Face-orange\"\u003e\u003c/a\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://huggingface.co/Multi-SWE-RL\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Multi_SWE_RL_Community-Hugging Face-EE9A12\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://discord.gg/EtfbkfqUuN\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Multi_SWE_RL_Community-Discord-1449DA\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/multi-swe-bench/multi-swe-bench/blob/main/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-Apache-blue\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nWe have modified the original [**Openhands**](https://github.com/All-Hands-AI/OpenHands) (0.25.0 version) compatible with [**Multi-SWE-Bench**](https://github.com/multi-swe-bench/multi-swe-bench)! MopenHands can be used to evaluate the performance of LLMs across 7 languages(c++, c, java, go, rust, typescript, javascript) in the [**Multi-SWE-Bench** dataset](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench).\n\n\n## To Start\n### 1. Environment Preparing\n```bash\nconda create -n openhands python=3.12 conda-forge::nodejs conda-forge::poetry\nconda activate openhands\nmake build\n```\nMake sure you have docker environment in your local device\nYou should first create a file named config.toml, and update your model key in the file, for example:\n```bash\n[llm.YYY]\nmodel = \"llm.xxx\"\nbase_url = \"xxx\"\napi_key = \"xxx\"\n```\n\n### 2. Dataset Preparing\nYou should first download the [**Multi-SWE-Bench** dataset](https://huggingface.co/datasets/bytedance-research/Multi-SWE-Bench).\nAnd change the dataset following /evaluation/benchmarks/swe_bench/data/data_change.py\n\n\n## Run Inference on SWE-Bench Instances\n\n```bash\nbash evaluation/benchmarks/swe_bench/infer.sh\n```\n### Explanation\n\n- `models`, e.g. `llm.eval_gpt4_1106_preview`, is the config group name for your\nLLM settings, as defined in your `config.toml`.\n- `git-version`, e.g. `HEAD`, is the git commit hash of the OpenHands version you would\nlike to evaluate. It could also be a release tag like `0.6.2`.\n- `agent`, e.g. `CodeActAgent`, is the name of the agent for benchmarks, defaulting to `CodeActAgent`.\n- `eval_limit`, e.g. `10`, limits the evaluation to the first `eval_limit` instances. By\ndefault, the script evaluates the (500 issues), which will no exceed the maximum of the dataset number.\n- `max_iter`, e.g. `20`, is the maximum number of iterations for the agent to run. By\ndefault, it is set to 50.\n- `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By\ndefault, it is set to 1.\n- `language`, the language of your evaluating dataset.\n- `dataset`, the absolute position of the dataset jsonl.\n\n### Images\nWe provide the images for each instance. You can use the following command to download the images directly from [our docker hub site](https://hub.docker.com/repositories/mopenhands0) rather than build them locally.\n\n## 📊 Evaluation\nAfter running the agent, all the predicted patches will be save in `evaluation/evaluation_outputs` directory, named as `output.jsonl`. You can extract the `git_patch` of each instance and then you can evaluate in the [multi-swe-bench](https://github.com/multi-swe-bench/multi-swe-bench) repo\n\n### Run Evaluation\n\nTo run the evaluation, you need to prepare the following:\n\n1. Patch Files: Some patch files in JSONL format, each item containing:\n   - `org`: Organization Name\n   - `repo`: Repository Name\n   - `number`: Pull Request Number\n   - `fix_patch`: Fix Patch Content\n2. Dataset Files: Dataset files in JSONL format available on Hugging Face, such as [Multi-SWE-Bench](https://huggingface.co/datasets/Multi-SWE-RL/Multi-SWE-Bench)\n\nThen you can run the evaluation using the following command:\n\n```bash\ncd multi-swe-bench\npython -m multi_swe_bench.harness.run_evaluation --config /path/to/your/config.json\n```\n\n## 📜 License\nThis project is licensed under Apache License 2.0. See the [LICENSE](/LICENSE) flie for details.\n## 📖 Citation\nIf you find our Multi-SWE-bench and MopenHands useful for your research and applications, feel free to give us a star ⭐ or cite us using:\n\n```bibtex\n@misc{zan2025multiswebench,\n      title={Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving}, \n      author={Daoguang Zan and Zhirong Huang and Wei Liu and Hanwu Chen and Linhao Zhang and Shulin Xin and Lu Chen and Qi Liu and Xiaojian Zhong and Aoyan Li and Siyao Liu and Yongsheng Xiao and Liangqiang Chen and Yuyu Zhang and Jing Su and Tianyu Liu and Rui Long and Kai Shen and Liang Xiang},\n      year={2025},\n      eprint={2504.02605},\n      archivePrefix={arXiv},\n      primaryClass={cs.SE},\n      url={https://arxiv.org/abs/2504.02605}, \n}\n```\n## 🏢 About [ByteDance Seed Team](https://team.doubao.com/)\n\nFounded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmulti-swe-bench%2FMopenHands","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmulti-swe-bench%2FMopenHands","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmulti-swe-bench%2FMopenHands/lists"}