{"id":31228768,"url":"https://github.com/bytedance/OneReward","last_synced_at":"2025-09-22T07:03:16.444Z","repository":{"id":313732083,"uuid":"1047016239","full_name":"bytedance/OneReward","owner":"bytedance","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-15T15:03:52.000Z","size":25,"stargazers_count":209,"open_issues_count":7,"forks_count":11,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-15T16:31:51.734Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bytedance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-29T15:47:18.000Z","updated_at":"2025-09-15T15:03:55.000Z","dependencies_parsed_at":"2025-09-08T06:18:19.041Z","dependency_job_id":"51fbc7f2-bc48-49ce-8170-367404ae2e1d","html_url":"https://github.com/bytedance/OneReward","commit_stats":null,"previous_names":["bytedance/onereward"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bytedance/OneReward","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOneReward","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOneReward/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOneReward/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOneReward/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bytedance","download_url":"https://codeload.github.com/bytedance/OneReward/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOneReward/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276361150,"owners_count":25628853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-22T02:00:08.972Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-22T07:01:54.430Z","updated_at":"2025-09-22T07:03:16.436Z","avatar_url":"https://github.com/bytedance.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# OneReward\r\n\r\nOfficial implementation of **[OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning](https://arxiv.org/abs/xxxx)**\r\n\r\n[![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2508.21066) [![model](https://img.shields.io/badge/🤗-Model-yellow)](https://huggingface.co/bytedance-research/OneReward) [![GitHub Pages](https://img.shields.io/badge/GitHub-Project-blue?logo=github)](https://one-reward.github.io/)\r\n\u003cbr\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003cimg src=\"assets/show.jpg\" alt=\"assert\" width=\"800\"\u003e\r\n\u003c/p\u003e\r\n\r\n## 🚀 TODO\r\n- [x] Release arXiv paper.\r\n- [x] Release inference code.\r\n- [x] Release `FLUX.1-Fill-dev[OneReward]` and `FLUX.1-Fill-dev[OneRewardDynamic]` mask-guided edit checkpoints.\r\n- [ ] Release `FLUX.1-dev[OneReward]` text-to-image checkpoints.\r\n- [ ] Future open-source plan.\r\n\r\n## Introduction\r\nWe propose **OneReward**, a novel RLHF methodology for the visual domain by employing Qwen2.5-VL as a generative reward model to enhance multitask reinforcement learning, significantly improving the policy model’s generation ability across multiple subtask. Building on OneReward, we develop **Seedream 3.0 Fill**, a unified SOTA image editing model capable of effec-tively handling diverse tasks including image fill, image extend, object removal, and text rendering. It surpasses several leading commercial and open-source systems, including Ideogram, Adobe Photoshop, and FLUX Fill [Pro]. Finally, based on FLUX Fill [dev], we are thrilled to release **FLUX.1-Fill-dev-OneReward**, which outperforms closed-source FLUX Fill [Pro] in inpainting and outpainting tasks, serving as a powerful new baseline for future research in unified image editing.\r\n\r\n\u003ctable\u003e\r\n  \u003ctr\u003e\r\n    \u003ctd\u003e\r\n      \u003cimg src=\"assets/radius_inpaint.png\" width=\"512\"\u003e\r\n      \u003cp align=\"center\"\u003e\u003cb\u003eImage Fill\u003c/b\u003e\u003c/p\u003e\r\n    \u003c/td\u003e\r\n    \u003ctd\u003e\r\n      \u003cimg src=\"assets/radius_outpaint_w.png\" width=\"512\"\u003e\r\n      \u003cp align=\"center\"\u003e\u003cb\u003eImage Extend with Prompt\u003c/b\u003e\u003c/p\u003e\r\n    \u003c/td\u003e\r\n  \u003c/tr\u003e\r\n  \u003ctr\u003e\r\n    \u003ctd\u003e\r\n      \u003cimg src=\"assets/radius_outpaint_wo.png\" width=\"512\"\u003e\r\n      \u003cp align=\"center\"\u003e\u003cb\u003eImage Extend without Prompt\u003c/b\u003e\u003c/p\u003e\r\n    \u003c/td\u003e\r\n    \u003ctd\u003e\r\n      \u003cimg src=\"assets/radius_eraser.png\" width=\"512\"\u003e\r\n      \u003cp align=\"center\"\u003e\u003cb\u003eObject Removal\u003c/b\u003e\u003c/p\u003e\r\n    \u003c/td\u003e\r\n  \u003c/tr\u003e\r\n  \u003ccaption align=\"bottom\" style=\"font-weight: bold; margin-top: 10px;\"\u003eSeedream 3.0 Fill Performance Overview\u003c/caption\u003e\r\n\u003c/table\u003e\r\n\r\n## Quick Start\r\n\r\n1. Make sure your transformers\u003e=4.51.3 (Supporting Qwen2.5-VL)\r\n\r\n2. Install the latest version of diffusers\r\n```\r\npip install -U diffusers\r\n```\r\n\r\nThe following contains a code snippet illustrating how to use the model to generate images based on text prompts and input mask, support inpaint(image-fill), outpaint(image-extend), eraser(object-removal). As the model is fully trained, FluxFillCFGPipeline with cfg is needed, you can find in [pipeline_flux_fill_with_cfg.py](src/pipeline_flux_fill_with_cfg.py).\r\n\r\n```python\r\nimport torch\r\nfrom diffusers.utils import load_image\r\nfrom diffusers import FluxTransformer2DModel\r\n\r\nfrom src.pipeline_flux_fill_with_cfg import FluxFillCFGPipeline\r\n\r\ntransformer_onereward = FluxTransformer2DModel.from_pretrained(\r\n    \"bytedance-research/OneReward\",\r\n    subfolder=\"flux.1-fill-dev-OneReward-transformer\",\r\n    torch_dtype=torch.bfloat16\r\n)\r\n\r\npipe = FluxFillCFGPipeline.from_pretrained(\r\n    \"black-forest-labs/FLUX.1-Fill-dev\", \r\n    transformer=transformer_onereward,\r\n    torch_dtype=torch.bfloat16).to(\"cuda\")\r\n\r\n# Image Fill\r\nimage = load_image('assets/image.png')\r\nmask = load_image('assets/mask_fill.png')\r\nimage = pipe(\r\n    prompt='the words \"ByteDance\", and in the next line \"OneReward\"',\r\n    negative_prompt=\"nsfw\",\r\n    image=image,\r\n    mask_image=mask,\r\n    height=image.height,\r\n    width=image.width,\r\n    guidance_scale=1.0,\r\n    true_cfg=4.0,\r\n    num_inference_steps=50,\r\n    generator=torch.Generator(\"cpu\").manual_seed(0)\r\n).images[0]\r\nimage.save(f\"image_fill.jpg\")\r\n```\r\n\r\n\u003ctable\u003e\r\n  \u003ctr\u003e\r\n    \u003ctd\u003e\r\n      \u003cimg src=\"assets/image.png\" width=\"512\"\u003e\r\n      \u003cp align=\"center\"\u003e\u003cb\u003einput\u003c/b\u003e\u003c/p\u003e\r\n    \u003c/td\u003e\r\n    \u003ctd\u003e\r\n      \u003cimg src=\"assets/result_fill.jpg\" width=\"512\"\u003e\r\n      \u003cp align=\"center\"\u003e\u003cb\u003eoutput\u003c/b\u003e\u003c/p\u003e\r\n    \u003c/td\u003e\r\n  \u003c/tr\u003e\r\n\u003c/table\u003e\r\n\r\nOr you can run the whole inference demo in [demo_one_reward.py](src/examples/demo_one_reward.py) and [demo_one_reward_dynamic.py](src/examples/demo_one_reward_dynamic.py)\r\n```python\r\npython3 -m src.examples.demo_one_reward\r\npython3 -m src.examples.demo_one_reward_dynamic\r\n```\r\n\r\n## Model\r\n### FLUX.1-Fill-dev[OneReward], trained with Alg.1 in paper\r\n```python\r\ntransformer_onereward = FluxTransformer2DModel.from_pretrained(\r\n    \"bytedance-research/OneReward\",\r\n    subfolder=\"flux.1-fill-dev-OneReward-transformer\",\r\n    torch_dtype=torch.bfloat16\r\n)\r\n\r\npipe = FluxFillCFGPipeline.from_pretrained(\r\n    \"black-forest-labs/FLUX.1-Fill-dev\", \r\n    transformer=transformer_onereward,\r\n    torch_dtype=torch.bfloat16).to(\"cuda\")\r\n```\r\n\r\n### FLUX.1-Fill-dev[OneRewardDynamic], trained with Alg.2 in paper\r\n```python\r\ntransformer_onereward_dynamic = FluxTransformer2DModel.from_pretrained(\r\n    \"bytedance-research/OneReward\",\r\n    subfolder=\"flux.1-fill-dev-OneRewardDynamic-transformer\",\r\n    torch_dtype=torch.bfloat16\r\n)\r\n\r\npipe = FluxFillCFGPipeline.from_pretrained(\r\n    \"black-forest-labs/FLUX.1-Fill-dev\", \r\n    transformer=transformer_onereward_dynamic,\r\n    torch_dtype=torch.bfloat16).to(\"cuda\")\r\n```\r\n\r\n## Multi-task Usage\r\n### Object Removal\r\n```python\r\nimage = load_image('assets/image.png')\r\nmask = load_image('assets/mask_remove.png')\r\nimage = pipe(\r\n    prompt='remove',  # using fix prompt in object removal\r\n    negative_prompt=\"nsfw\",\r\n    image=image,\r\n    mask_image=mask,\r\n    height=image.height,\r\n    width=image.width,\r\n    guidance_scale=1.0,\r\n    true_cfg=4.0,\r\n    num_inference_steps=50,\r\n    generator=torch.Generator(\"cpu\").manual_seed(0)\r\n).images[0]\r\nimage.save(f\"object_removal.jpg\")\r\n```\r\n\r\n### Image Extend with prompt\r\n```python\r\nimage = load_image('assets/image2.png')\r\nmask = load_image('assets/mask_extend.png')\r\nimage = pipe(\r\n    prompt='Deep in the forest, surronded by colorful flowers',\r\n    negative_prompt=\"nsfw\",\r\n    image=image,\r\n    mask_image=mask,\r\n    height=image.height,\r\n    width=image.width,\r\n    guidance_scale=1.0,\r\n    true_cfg=4.0,\r\n    num_inference_steps=50,\r\n    generator=torch.Generator(\"cpu\").manual_seed(0)\r\n).images[0]\r\nimage.save(f\"image_extend_w_prompt.jpg\")\r\n```\r\n\r\n### Image Extend without prompt\r\n```python\r\nimage = load_image('assets/image2.png')\r\nmask = load_image('assets/mask_extend.png')\r\nimage = pipe(\r\n    prompt='high-definition, perfect composition',  # using fix prompt in image extend wo prompt\r\n    negative_prompt=\"nsfw\",\r\n    image=image,\r\n    mask_image=mask,\r\n    height=image.height,\r\n    width=image.width,\r\n    guidance_scale=1.0,\r\n    true_cfg=4.0,\r\n    num_inference_steps=50,\r\n    generator=torch.Generator(\"cpu\").manual_seed(0)\r\n).images[0]\r\nimage.save(f\"image_extend_wo_prompt.jpg\")\r\n```\r\n\r\n\r\n## License Agreement\r\nCode is licensed under Apache 2.0. Model is licensed under CC BY NC 4.0.\r\n\r\n## Citation\r\n```\r\n@article{gong2025onereward,\r\n  title={OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning},\r\n  author={Gong, Yuan and Wang, Xionghui and Wu, Jie and Wang, Shiyin and Wang, Yitong and Wu, Xinglong},\r\n  journal={arXiv preprint arXiv:2508.21066},\r\n  year={2025}\r\n}\r\n```\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2FOneReward","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytedance%2FOneReward","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2FOneReward/lists"}