{"id":44134385,"url":"https://github.com/G-U-N/UniRL","last_synced_at":"2026-02-20T22:00:56.392Z","repository":{"id":317637395,"uuid":"1068211154","full_name":"G-U-N/UniRL","owner":"G-U-N","description":"a unified reinforcement learning toolbox for joint RL on language models and diffusion models ","archived":false,"fork":false,"pushed_at":"2026-02-07T01:28:50.000Z","size":29670,"stargazers_count":66,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"prompt_rl","last_synced_at":"2026-02-07T13:02:45.052Z","etag":null,"topics":["diffusion-models","flow-matching","generative-ai","language-model","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://g-u-n.github.io/projects/promptrl/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/G-U-N.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-02T02:52:49.000Z","updated_at":"2026-02-07T11:31:35.000Z","dependencies_parsed_at":"2025-10-02T05:37:29.149Z","dependency_job_id":"972ff795-f712-4406-9d6e-b62ade5e2538","html_url":"https://github.com/G-U-N/UniRL","commit_stats":null,"previous_names":["g-u-n/unirl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/G-U-N/UniRL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-U-N%2FUniRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-U-N%2FUniRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-U-N%2FUniRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-U-N%2FUniRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/G-U-N","download_url":"https://codeload.github.com/G-U-N/UniRL/tar.gz/refs/heads/prompt_rl","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-U-N%2FUniRL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29666419,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-20T19:49:36.704Z","status":"ssl_error","status_checked_at":"2026-02-20T19:44:05.372Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","flow-matching","generative-ai","language-model","reinforcement-learning"],"created_at":"2026-02-08T23:00:22.911Z","updated_at":"2026-02-20T22:00:56.385Z","avatar_url":"https://github.com/G-U-N.png","language":"Python","funding_links":[],"categories":["3D-assets"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/logo.png\" width=\"30%\"\u003e\u003cbr\u003e\n  PromptRL\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2602.01382\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-2602.01382-b31b1b.svg\" alt=\"arXiv\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://g-u-n.github.io/projects/promptrl/\"\u003e\u003cimg src=\"https://img.shields.io/badge/Project-Page-green.svg\" alt=\"Project Page\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/wangfuyun/PrompRL\"\u003e\u003cimg src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue\" alt=\"HuggingFace\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n## Overview\n\n**PromptRL** is a framework that jointly trains language models (LMs) and flow-matching models (FMs) within a unified reinforcement learning loop for text-to-image generation. By incorporating LMs as adaptive prompt refiners, PromptRL addresses two critical limitations in current flow-based RL pipelines: *exploration collapse* due to insufficient generation diversity, and *prompt overfitting* where models memorize specific training formulations.\n\n\n## Installation\n\n```bash\nconda env create -f environment.yml\nconda activate unirl\npip install git+https://github.com/openai/CLIP.git\npip install git+https://github.com/huggingface/diffusers.git\npip install flash-attn==2.7.4.post1 --no-build-isolation\n\n# run gen.sh for evaluation\n# bash gen.sh\n```\n\n## Qualitative Results\n\n### Text-to-Image Generation\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/t2i_comparison.png\" width=\"85%\"\u003e\n\u003c/p\u003e\n\n### Instructional Image Editing\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/edit_comparison.png\" width=\"75%\"\u003e\n\u003c/p\u003e\n\n\n## Key Results\n\nPromptRL achieves **2× sample efficiency** compared to flow-only RL while obtains a adaptative prompt refinement agent to improve test-time performance.\n\n### Summary\n\n| Benchmark | Metric | PromptRL w/ PE | Best Baseline |\n|:---|:---|:---:|:---:|\n| GenEval | Avg. Score ↑ | **0.97** | 0.92 (FlowGRPO) |\n| Aesthetic | PickScore ↑ | **24.05** | 23.63 (DiffusionNFT) |\n| Aesthetic | HPS ↑ | **32.03** | 31.79 (DiffusionNFT) |\n| OCR | OCR-1k ↑ | **0.98** | 0.89 (FlowGRPO) |\n| Image Editing | EditReward Avg. ↑ | **1.43** | 1.44 (ReasonEdit-Think) |\n\n---\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e📊 GenEval Benchmark (Full Results)\u003c/b\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n| Model | 1 Obj. | 2 Obj. | Cnt. | Clr. | Pos. | Attr. | Avg.↑ |\n|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n| Show-o | 0.95 | 0.52 | 0.49 | 0.82 | 0.11 | 0.28 | 0.53 |\n| Emu3-Gen | 0.98 | 0.71 | 0.34 | 0.81 | 0.17 | 0.21 | 0.54 |\n| SD3 Medium | 0.98 | 0.74 | 0.63 | 0.67 | 0.34 | 0.36 | 0.62 |\n| FLUX.1-dev | 0.98 | 0.81 | 0.74 | 0.79 | 0.22 | 0.45 | 0.66 |\n| SD3.5 Large | 0.98 | 0.89 | 0.73 | 0.83 | 0.34 | 0.47 | 0.71 |\n| JanusFlow | 0.97 | 0.59 | 0.45 | 0.83 | 0.53 | 0.42 | 0.63 |\n| Janus-Pro-7B | 0.99 | 0.89 | 0.59 | 0.90 | 0.79 | 0.66 | 0.80 |\n| HiDream | 1.00 | 0.98 | 0.79 | 0.91 | 0.60 | 0.72 | 0.83 |\n| Seedream 3.0 | 0.99 | 0.96 | 0.91 | 0.93 | 0.47 | 0.80 | 0.84 |\n| Qwen-Image | 0.99 | 0.92 | 0.89 | 0.88 | 0.76 | 0.77 | 0.87 |\n| *RL-based* |  |  |  |  |  |  |  |\n| RePrompt | 0.98 | 0.87 | 0.77 | 0.85 | 0.62 | 0.49 | 0.76 |\n| FlowGRPO | 1.00 | 0.99 | 0.91 | 0.89 | 0.95 | 0.80 | 0.92 |\n| DiffusionNFT | 1.00 | 0.98 | 0.74 | 0.92 | 0.85 | 0.80 | 0.88 |\n| PromptRL w/o PE | 1.00 | 0.96 | 0.95 | 0.95 | 0.93 | 0.85 | 0.94 |\n| **PromptRL w/ PE** | **1.00** | **0.99** | **0.99** | **0.96** | **0.99** | **0.90** | **0.97** |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e🎨 Aesthetic \u0026 OCR Metrics (Full Results)\u003c/b\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n| Model | P.S. | HPS | U.R. | OCR-1k | TMDB | OpenLib |\n|:---|:---:|:---:|:---:|:---:|:---:|:---:|\n| SD1.5 | 20.92 | 23.71 | 2.00 | 0.05 | 0.13 | 0.08 |\n| SDXL | 22.14 | 26.67 | 2.78 | 0.13 | 0.20 | 0.09 |\n| SD3 Medium | 22.38 | 28.56 | 3.09 | — | 0.44 | 0.33 |\n| FLUX.1-schnell | 22.64 | 29.39 | 3.25 | 0.54 | 0.66 | 0.50 |\n| FLUX.2-klein | 22.79 | 29.03 | 3.29 | 0.55 | 0.22 | 0.46 |\n| Z-Image | 20.14 | 28.22 | 3.51 | 0.70 | 0.71 | 0.83 |\n| Qwen-Image | 23.05 | 30.40 | 3.53 | 0.65 | 0.79 | 0.94 |\n| Qwen-Image-2512 | 23.16 | 30.79 | 3.40 | 0.72 | 0.81 | 0.87 |\n| *RL-based* |  |  |  |  |  |  |\n| FlowGRPO | 23.33 | 29.80 | 3.33 | 0.89 | 0.83 | 0.73 |\n| DiffusionNFT | 23.63 | 31.79 | 3.39 | 0.89 | 0.91 | 0.86 |\n| PromptRL w/o PE | 24.01 | 31.79 | 3.38 | 0.97 | 0.92 | 0.95 |\n| **PromptRL w/ PE** | **24.05** | **32.03** | **3.44** | **0.98** | **0.91** | **0.95** |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e✏️ Image Editing - EditReward (Full Results)\u003c/b\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n| Model | Swap | Style | Add. | Attr. | Env. | Removal | Avg.↑ |\n|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n| InstructPix2Pix | -0.24 | 0.91 | -0.45 | 0.45 | 0.48 | -0.80 | 0.02 |\n| MagicBrush | -0.38 | 0.36 | -0.78 | -0.80 | 0.91 | -0.85 | -0.27 |\n| LEDITS++ | -0.81 | -0.32 | -0.30 | -0.60 | -0.37 | -0.97 | -0.60 |\n| Qwen-Image-Edit | 1.11 | 1.14 | 0.95 | 0.90 | 1.39 | 0.61 | 1.03 |\n| FLUX.2-klein | 1.42 | 1.73 | 1.29 | 1.42 | 1.80 | 0.32 | 1.34 |\n| Nano Banana | 1.58 | 1.20 | 1.28 | 1.18 | 1.61 | 1.13 | 1.37 |\n| Step1X-Edit | 1.39 | 1.58 | 1.19 | 1.34 | 1.57 | 0.22 | 1.24 |\n| ReasonEdit | 1.51 | 1.43 | 1.19 | 1.47 | 1.58 | 1.14 | 1.40 |\n| ReasonEdit-Think | 1.52 | 1.47 | 1.19 | 1.44 | 1.69 | 1.27 | 1.44 |\n| FLUX.1-Kontext | 1.35 | 1.36 | 1.16 | 1.15 | 1.44 | 0.55 | 1.19 |\n| FLUX.1-Kontext w/ PE | 1.35 | 0.97 | 1.04 | 0.48 | 1.22 | 0.65 | 1.01 |\n| PromptRL w/o PE | 1.45 | 1.46 | 1.28 | 1.35 | 1.56 | 0.98 | 1.36 |\n| **PromptRL w/ PE** | **1.47** | **1.43** | **1.29** | **1.39** | **1.72** | **1.24** | **1.43** |\n\n\u003c/details\u003e\n\n\n\n## Citation\n\n```bibtext\n@article{wang2025promptrl,\n  title={PromptRL: Prompt Matters in RL for Flow-Based Image Generation},\n  author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},\n  journal={arXiv preprint arXiv:2602.01382},\n  year={2026}\n}\n```\n\n```bibtext\n@article{wang2025unirl,\n  title={UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts},\n  author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},\n  journal={arXiv preprint arXiv:2510.17937},\n  year={2025}\n}\n```\n\n## Acknowledgments\n\nThis codebase builds upon [UniRL-Zero](https://github.com/G-U-N/UniRL/tree/master).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FG-U-N%2FUniRL","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FG-U-N%2FUniRL","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FG-U-N%2FUniRL/lists"}