{"id":31584113,"url":"https://github.com/tiger-ai-lab/editreward","last_synced_at":"2026-02-14T23:30:56.756Z","repository":{"id":317536831,"uuid":"1067285853","full_name":"TIGER-AI-Lab/EditReward","owner":"TIGER-AI-Lab","description":"EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]","archived":false,"fork":false,"pushed_at":"2026-02-06T07:57:15.000Z","size":17343,"stargazers_count":119,"open_issues_count":6,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-06T14:43:16.984Z","etag":null,"topics":["diffusion","editing","evaluation"],"latest_commit_sha":null,"homepage":"https://tiger-ai-lab.github.io/EditReward/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TIGER-AI-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-30T16:31:10.000Z","updated_at":"2026-02-06T10:32:28.000Z","dependencies_parsed_at":"2025-10-28T20:26:35.807Z","dependency_job_id":"55a45369-83e9-4228-a979-001ddf73f312","html_url":"https://github.com/TIGER-AI-Lab/EditReward","commit_stats":null,"previous_names":["tiger-ai-lab/editreward"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TIGER-AI-Lab/EditReward","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FEditReward","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FEditReward/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FEditReward/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FEditReward/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TIGER-AI-Lab","download_url":"https://codeload.github.com/TIGER-AI-Lab/EditReward/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FEditReward/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29460669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-14T22:42:09.113Z","status":"ssl_error","status_checked_at":"2026-02-14T22:42:05.053Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion","editing","evaluation"],"created_at":"2025-10-06T00:22:10.598Z","updated_at":"2026-02-14T23:30:56.751Z","avatar_url":"https://github.com/TIGER-AI-Lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"./assets/logo.png\"  width=\"50%\"\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n# ✨[ICLR 2026] EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing\n \n[![Project Website](https://img.shields.io/badge/🌐-Project%20Website-deepgray)](https://tiger-ai-lab.github.io/EditReward/)\n[![arXiv](https://img.shields.io/badge/arXiv-2509.26346-b31b1b.svg)](https://arxiv.org/abs/2509.26346)\n[![Model](https://img.shields.io/badge/🤗-Model-yellow)](https://huggingface.co/collections/TIGER-Lab/editreward-68ddf026ef9eb1510458abc6)\n[![Dataset](https://img.shields.io/badge/🤗-Dataset-green)](https://huggingface.co/datasets/TIGER-Lab/EditReward-Data)\n[![Benchmark](https://img.shields.io/badge/📊-Benchmark-yello)](https://huggingface.co/datasets/TIGER-Lab/EditReward-Bench)\n\u003c/div\u003e\n\n\u003cp align=\"center\" style=\"font-size: 10em;\"\u003e\n  We acknowledge the data contribution and support from    \n  \u003ca href=\"https://www.abaka.ai/\" target=\"_blank\"\u003e\n    \u003cimg src=\"./assets/logo_abaka.png\"\n         alt=\"Abaka AI\"\n         style=\"height:2.1em; vertical-align:text-bottom; position:relative; top:7px;\"\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\n## 📖 Introduction\n\nThis is the official implementation for the paper: [EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing](https://arxiv.org/abs/2509.26346).\nIn this paper, we introduce **EditReward**, a human-aligned reward model powered by a high-quality dataset for instruction-guided image editing. We first construct **EditReward-Data**, a large-scale, high-fidelity preference dataset for instruction-guided image editing. It comprises over 200K manually annotated preference pairs, covering a diverse range of edits produced by seven state-of-the-art models across twelve distinct sources. Every preference annotation in **EditReward-Data** was curated by trained annotators following a rigorous and standardized protocol, ensuring high alignment with considered human judgment and minimizing label noise. Using this dataset, we train the reward model **EditReward** to score instruction-guided image edits. To rigorously assess **EditReward** and future models, we also introduce **EditReward-Bench** a new benchmark built upon our high-quality annotations, which includes more difficult multi-way preference prediction.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/pipeline.png\" alt=\"Teaser\" width=\"900\"/\u003e\n\u003c/p\u003e\n\n\n## 📰 News\n- **[2026-02-06]** 🔥 We started maintaining a list of \u003ca href=\"#-awesome-works-using-editreward\"\u003eAwesome Works\u003c/a\u003e using EditReward!\n- **[2026-01-27]** 🔥 Add training \u0026 inference support for **Qwen3-VL Series**!\n- **[2026-01-26]** 🔥 Our paper has been accepted by **ICLR 2026**!\n- **[2025-10-29]** 🔥 Release the training guideline of EditReward, see [Training Insctruction](EditReward/TRAIN_README.md)!\n- **[2025-10-14]** 🔥 Release the evaluation code and guideline of EditReward-Bench, see [Evaluate Insctruction](EditReward/evaluate/README.md)!\n- **[2025-10-10]** 🔥 Release our evaluation benchmark EditReward-Bench, Welcome to use!\n- **[2025-10-08]** 🔥 Release our training dataset EditReward-Data, Welcome to use!\n- **[2025-10-03]** 🔥 Release inference code and pretrained model.\n- **[2025-10-01]** 🎉 We initialize the official repo of EditReward.\n\n\u003c!-- TODO List --\u003e\n## 🚧 TODO List\n- [x] Release inference code and pretrained model\n- [x] Release evaluation benchmark\n- [x] Release training code\n- [x] Release training dataset\n\u003c!-- - [ ] Release better model --\u003e\n\n## 📄 Table of Contents\n- [🛠️ Installation](#-installation)\n- [👨‍🏫 Get Started](#-get-started)\n- [🏋️ Training](#-training)\n- [📊 Benchmark](#-benchmark)\n- [🖊️ Citation](#-citation)\n- [🤝 Acknowledgement](#-acknowledgement)\n- [✨ Awesome Works using EditReward](#-awesome-works-using-editreward)\n- [🎫 License](#-license)\n---\n\n## 🚀 Quick Start\n\nEditReward is a VLM-based reward model trained on EditReward-Data that demonstrates superior alignment with human preferences.\n\n### 💻 Installation\n\n\u003c!-- # Method 1: Pypi download and install for inference.\npip install hpsv3 --\u003e\n\n```bash\n\ngit clone https://github.com/TIGER-AI-Lab/EditReward.git\ncd EditReward\n\nconda create -n edit_reward python=3.10 -y\nconda activate edit_reward\npip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124\npip install datasets pillow openai -U megfile sentencepiece deepspeed fire omegaconf matplotlib peft trl==0.8.6 tensorboard scipy transformers==4.57.0 accelerate\n# Recommend: Install flash-attn\npip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.2.post1/flash_attn-2.7.2.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\n\n```\n\n\n### 🚀 Usage\n\n#### Basic Command\n```python\nimport os\nimport sys\n# Add project root to Python path (optional, for local development)\nsys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))\nimport torch\nfrom EditReward import EditRewardInferencer\nfrom EditReward.inference_vl_edit import EditRewardVLInferencer\n\n# ------------------------------------------------------------------------------\n# Example script for evaluating edited images with EditReward\n# ------------------------------------------------------------------------------\n\n# Path to model checkpoint (update to your own local or HF path)\nCHECKPOINT_PATH = \"your/local/path/to/checkpoint\"\nCONFIG_PATH = \"config/EditReward-MiMo-VL-7B-SFT-2508.yaml\"\n\n# Initialize reward model\ninferencer = EditRewardInferencer(\n    config_path=CONFIG_PATH,\n    checkpoint_path=CHECKPOINT_PATH,\n    device=\"cuda\",        # or \"cpu\"\n    reward_dim=\"overall_detail\",    # choose reward dimension if applicable\n    rm_head_type=\"ranknet_multi_head\"\n)\n\n# (Optional) Unified inferencer for Qwen2.5-VL / Qwen3-VL:\n# Just switch CONFIG_PATH to either:\n# - \"config/EditReward-Qwen2.5-7B-VL.yaml\"\n# - \"config/EditReward-Qwen3-VL.yaml\"\n# inferencer = EditRewardVLInferencer(\n#     config_path=CONFIG_PATH,\n#     checkpoint_path=CHECKPOINT_PATH,\n#     device=\"cuda\",\n#     reward_dim=\"overall_detail\",\n#     rm_head_type=\"ranknet_multi_head\",\n# )\n\n# Example input data -----------------------------------------------------------\n# image_src = [\n#     \"../assets/examples/source_img_1.png\",\n#     \"../assets/examples/source_img_1.png\",\n# ]\n\n# image_paths = [\n#     \"../assets/examples/target_img_1.png\",\n#     \"../assets/examples/target_img_2.png\",\n# ]\nimage_src = [\n    \"your/local/path/to/source_image_1.jpg\",\n    \"your/local/path/to/source_image_2.jpg\",\n]\n\nimage_paths = [\n    \"your/local/path/to/edited_image_1.jpg\",\n    \"your/local/path/to/edited_image_2.jpg\",\n]\n\n# example instruction: \"Add a green bowl on the branch\"\n# prompts = [\n#     \"Add a green bowl on the branch\",\n#     \"Add a green bowl on the branch\"\n# ]\nprompts = [\n    \"your first editing instruction\",\n    \"your second editing instruction\"\n]\n\n# ------------------------------------------------------------------------------\n# Main evaluation modes\n# ------------------------------------------------------------------------------\nif __name__ == \"__main__\":\n    mode = \"pairwise_inference\"  # or \"single_inference\"\n\n    if mode == \"pairwise_inference\":\n        # ----------------------------------------------------------\n        # Pairwise comparison: compares two edited images side-by-side\n        # ----------------------------------------------------------\n        with torch.no_grad():\n          rewards = inferencer.reward(\n              prompts=prompts,\n              image_src=image_src,\n              image_paths=image_paths\n          )\n        scores = [reward[0].item() for reward in rewards]\n        print(f\"[Pairwise Inference] Image scores: {scores}\")\n\n    elif mode == \"single_inference\":\n        # ----------------------------------------------------------\n        # Single image scoring: evaluates one edited image at a time\n        # ----------------------------------------------------------\n        with torch.no_grad():\n          rewards = inferencer.reward(\n              prompts=[prompts[0]],\n              image_src=[image_src[0]],\n              image_paths=[image_paths[0]]\n          )\n        print(f\"[Single Inference] Image 1 score: {[reward[0].item() for reward in rewards]}\")\n        \n        with torch.no_grad():\n          rewards = inferencer.reward(\n              prompts=[prompts[0]],\n              image_src=[image_src[0]],\n              image_paths=[image_paths[1]]\n          )\n        print(f\"[Single Inference] Image 2 score: {[reward[0].item() for reward in rewards]}\")\n```\n\n---\n\n\n## 📁 Dataset\n\n### EditReward-Data\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/dataset_stat.png\" alt=\"dataset\" width=\"900\"/\u003e\n\u003c/p\u003e\n\u003c!-- \u003cdetails close\u003e --\u003e\n\n### Download EditReward\n\u003c!-- ```\nHPDv3 is comming soon! Stay tuned!\n``` --\u003e\n```bash\nhuggingface-cli download --repo-type dataset TIGER-Lab/EditReward-Data --local-dir /your-local-dataset-path\n```\n\n## 🏋️ Training\n\n### 🤖 Model Support\n\n- [x] **Qwen2.5-VL Series** \n- [x] **MiMo-VL Series**\n- [x] **Qwen3-VL Series**\n\n### 🚀 Training Command\n\nTo train **EditReward** model, follow the detail instruction in [Training Insctruction](EditReward/TRAIN_README.md)\n\n#### Unified training entry (Qwen2.5-VL / Qwen3-VL)\n\nWe provide a unified training entry that automatically selects the correct model/collator based on `model_name_or_path`:\n\n```bash\n# Qwen2.5-VL\npython EditReward/EditReward/train_qwen_vl_edit.py --config EditReward/EditReward/config/EditReward-Qwen2.5-7B-VL.yaml\n\n# Qwen3-VL\npython EditReward/EditReward/train_qwen_vl_edit.py --config EditReward/EditReward/config/EditReward-Qwen3-VL.yaml\n```\n\n---\n\n## 📊 Benchmark\nTo evaluate **EditReward preference accuracy**, follow the detail instruction in [Evaluate Insctruction](EditReward/evaluate/README.md)\n\n\u003cdetails open\u003e\n\n\u003csummary\u003e Experimental Results: Alignment with Humans \u003c/summary\u003e\n\n| Method | GenAI-Bench | AURORA-Bench | ImagenHub | EditReward-Bench (Overall) |\n| :--- | :--- | :--- | :--- | :--- |\n| Random | 25.90 | 33.43 | -- | 13.84 |\n| Human-to-Human | -- | -- | 41.84 | -- |\n| ***Proprietary Models*** | | | | |\n| GPT-4o | 53.54 | 50.81 | 38.21 | 28.31 |\n| GPT-5 | 59.61 | 47.27 | \u003cu\u003e40.85\u003c/u\u003e | 37.81 |\n| Gemini-2.0-Flash | 53.32 | 44.31 | 23.69 | 33.47 |\n| Gemini-2.5-Flash | 57.01 | 47.63 | **41.62** | \u003cu\u003e38.02\u003c/u\u003e |\n| ***Open-Source VLMs*** | | | | |\n| Qwen2.5-VL-3B-Inst | 42.76 | 30.69 | -2.54 | 26.86 |\n| Qwen2.5-VL-7B-Inst | 40.48 | 38.62 | 18.59 | 29.75 |\n| Qwen2.5-VL-32B-Inst | 39.28 | 37.06 | 26.87 | 28.72 |\n| MiMo-VL-7B-SFT-2508 | 57.89 | 30.43 | 22.14 | 31.19 |\n| ADIEE | 59.96 | 55.56 | 34.50 | -- |\n| ***Reward Models (Ours)*** | | | | |\n| EditReward (on Qwen2.5-VL-7B) | \u003cu\u003e63.97\u003c/u\u003e | \u003cu\u003e59.50\u003c/u\u003e | 36.18 | 36.78 |\n| EditReward (on MiMo-VL-7B) | **65.72** | **63.62** | 35.20 | **38.42** |\n\u003c/details\u003e\n\n---\n\n\u003cdetails open\u003e\n\n\u003csummary\u003e EditReward-Bench Results \u003c/summary\u003e\n\n| Method | EditReward-Bench (K=2) | EditReward-Bench (K=3) | EditReward-Bench (K=4) | EditReward-Bench (Overall) |\n| :--- | :--- | :--- | :--- | :--- |\n| Random | 25.81 | 11.33 | 1.35 | 13.84 |\n| Human-to-Human | -- | -- | -- | -- |\n| ***Proprietary Models*** | | | | |\n| GPT-4o | 45.69 | 27.33 | 7.31 | 28.31 |\n| GPT-5 | \u003cu\u003e57.53\u003c/u\u003e | 38.51 | \u003cu\u003e12.84\u003c/u\u003e | 37.81 |\n| Gemini-2.0-Flash | 52.43 | 33.33 | **13.51** | 33.47 |\n| Gemini-2.5-Flash | **58.61** | \u003cu\u003e39.86\u003c/u\u003e | 12.16 | \u003cu\u003e38.02\u003c/u\u003e |\n| ***Open-Source VLMs*** | | | | |\n| Qwen2.5-VL-3B-Inst | 51.07 | 20.27 | 2.71 | 26.86 |\n| Qwen2.5-VL-7B-Inst | 52.69 | 24.67 | 3.38 | 29.75 |\n| Qwen2.5-VL-32B-Inst | 50.54 | 25.27 | 4.05 | 28.72 |\n| MiMo-VL-7B-SFT-2508 | 49.46 | 30.41 | 9.46 | 31.19 |\n| ADIEE | -- | -- | -- | -- |\n| ***Reward Models (Ours)*** | | | | |\n| EditReward (on Qwen2.5-VL-7B) | 56.99 | 36.00 | 10.81 | 36.78 |\n| EditReward (on MiMo-VL-7B) | 56.45 | **42.67** | 11.49 | **38.42** |\n\u003c/details\u003e\n\n---\n\n\n## 📚 Citation\n\nPlease kindly cite our paper if you use our code, data, models or results:\n\n```bibtex\n@article{wu2025editreward,\n  title={EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing},\n  author={Wu, Keming and Jiang, Sicong and Ku, Max and Nie, Ping and Liu, Minghao and Chen, Wenhu},\n  journal={arXiv preprint arXiv:2509.26346},\n  year={2025}\n}\n```\n\n\n---\n\n## 🙏 Acknowledgements\n\nWe would like to thank the [HPSv3](https://github.com/MizzenAI/HPSv3), [VideoAlign](https://github.com/KwaiVGI/VideoAlign) and [GenAI-Bench](https://github.com/TIGER-AI-Lab/GenAI-Bench) codebase for providing valuable references.\n\n---\n\n## \u003ca id=\"-awesome-works-using-editreward\"\u003e\u003c/a\u003e✨ Awesome Works using EditReward\n\n😊 Reve, CUHK, [PromptRL: Prompt Matters in RL for Flow-Based Image Generation](https://arxiv.org/abs/2602.01382).\n\n😊 Adobe, HKU, [Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing](https://arxiv.org/abs/2512.17909).\n\n😊 Meta, [Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image](https://arxiv.org/pdf/2512.16899).\n\n😊 Google DeepMind, CUHK, [Image Diffusion Preview with Consistency Solver](https://arxiv.org/abs/2512.13592).\n\n---\n## ⭐ Star History [🔝](#-table-of-contents)\n\n[![Star History Chart](https://api.star-history.com/svg?repos=TIGER-AI-Lab/EditReward\u0026type=Date)](https://star-history.com/#TIGER-AI-Lab/EditReward\u0026Date)\n## 💬 Support\n\nFor questions and support:\n- **Issues**: [GitHub Issues](https://github.com/TIGER-AI-Lab/EditReward/issues)\n- **Email**: wukeming0608@gmail.com \u0026 wenhuchen@uwaterloo.ca\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiger-ai-lab%2Feditreward","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftiger-ai-lab%2Feditreward","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiger-ai-lab%2Feditreward/lists"}