awesome-rl-in-generative-ai

✨✨latest advancements of RL in generative ai
https://github.com/orlando-cs/awesome-rl-in-generative-ai

Last synced: 5 days ago
JSON representation

🧩 Flow Matching + RL (Paradigm I: Online Policy Optimization)
- 📄 - aware GRPO | Studies reward-timing effects in FM | Better RL stability |
- 📄
- 📄
- 📄 - domain scaling |
- 📄 - SDE sliding window | RL on key steps only, ODE for others | Efficiency ↑ 50–71% |
- 📄 - SDE sliding window | RL on key steps only, ODE for others | Efficiency ↑ 50–71% |
📖 Surveys & Tutorials
🔄 Flow Matching Foundations
🎯 Preference Alignment (Paradigm II: Direct Preference Learning)
- 📄 - agent | Dynamic preference pairs in reasoning | Applied to LLM math |
- 📄 - as-energy formulation | Leverages Q-function, no extra model |
- 📄 - flow-matching) ![Stars](https://img.shields.io/github/stars/jadehaus/preference-flow-matching?style=social) | 2024 | Preference-based | Vector field learning y⁻→y⁺ | Black-box friendly, avoids reward hacking |
- 📄 - agent | Dynamic preference pairs in reasoning | Applied to LLM math |
- 📄 - tuning | Wasserstein-2 reg. | Prevents mode collapse |
📅 Timeline of Key Works
- Flow Matching Guide and Code - source implementations. |
- TempFlow-GRPO
- Energy-Weighted Flow Matching (EFM) - as-energy formulation. |
- Preference Alignment with Flow Matching (PFM) - based flow alignment; [GitHub Repo ![Stars](https://img.shields.io/github/stars/jadehaus/preference-flow-matching?style=social)](https://github.com/jadehaus/preference-flow-matching). |
- DanceGRPO
🛠️ Code & Implementations
- Flow Matching Blog Implementations
- jadehaus/preference-flow-matching - flow-matching?style=social) — PFM PyTorch implementation.
- DanceGRPO Project
- Flow Matching Blog Implementations
- jadehaus/preference-flow-matching - flow-matching?style=social) — PFM PyTorch implementation.
- DanceGRPO Project
🔥 Latest Papers
- arXiv

Programming Languages

Python 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-rl-in-generative-ai

🧩 Flow Matching + RL (Paradigm I: Online Policy Optimization)

📖 Surveys & Tutorials

🔄 Flow Matching Foundations

🎯 Preference Alignment (Paradigm II: Direct Preference Learning)

📅 Timeline of Key Works

🛠️ Code & Implementations

🔥 Latest Papers