awesome-rl-in-generative-ai
β¨β¨latest advancements of RL in generative ai
https://github.com/orlando-cs/awesome-rl-in-generative-ai
Last synced: 5 days ago
JSON representation
-
π§© Flow Matching + RL (Paradigm I: Online Policy Optimization)
-
π Surveys & Tutorials
- LLM Optimization: GRPO, PPO, and DPO (Analytics Vidhya, 2025)
- Preference Tuning LLMs: PPO, DPO, GRPO β A Simple Guide
- What is GRPO and Flow-GRPO? (Turing Post)
- Reinforcement Learning for Generative AI: A Survey (2023)
- LLM Optimization: GRPO, PPO, and DPO (Analytics Vidhya, 2025)
- Preference Tuning LLMs: PPO, DPO, GRPO β A Simple Guide
- What is GRPO and Flow-GRPO? (Turing Post)
- Reinforcement Learning for Generative AI: A Survey (2023)
-
π Flow Matching Foundations
-
π― Preference Alignment (Paradigm II: Direct Preference Learning)
- π - agent | Dynamic preference pairs in reasoning | Applied to LLM math |
- π - as-energy formulation | Leverages Q-function, no extra model |
- π - flow-matching)  | 2024 | Preference-based | Vector field learning yβ»βyβΊ | Black-box friendly, avoids reward hacking |
- π - agent | Dynamic preference pairs in reasoning | Applied to LLM math |
- π - tuning | Wasserstein-2 reg. | Prevents mode collapse |
-
π Timeline of Key Works
- Flow Matching Guide and Code - source implementations. |
- TempFlow-GRPO
- Energy-Weighted Flow Matching (EFM) - as-energy formulation. |
- Preference Alignment with Flow Matching (PFM) - based flow alignment; [GitHub Repo ](https://github.com/jadehaus/preference-flow-matching). |
- DanceGRPO
-
π οΈ Code & Implementations
- Flow Matching Blog Implementations
- jadehaus/preference-flow-matching - flow-matching?style=social) β PFM PyTorch implementation.
- DanceGRPO Project
- Flow Matching Blog Implementations
- jadehaus/preference-flow-matching - flow-matching?style=social) β PFM PyTorch implementation.
- DanceGRPO Project
-
π₯ Latest Papers
Programming Languages
Categories
Sub Categories