{"id":27344448,"url":"https://github.com/Raj-08/Q-Flow","last_synced_at":"2025-04-12T17:06:22.968Z","repository":{"id":280292580,"uuid":"937353995","full_name":"Raj-08/Q-Flow","owner":"Raj-08","description":"Complete Reinforcement Learning Toolkit for Large Language Models!","archived":false,"fork":false,"pushed_at":"2025-03-14T11:00:32.000Z","size":1102,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-14T12:21:29.635Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Raj-08.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-22T21:44:54.000Z","updated_at":"2025-03-14T11:00:35.000Z","dependencies_parsed_at":"2025-03-02T15:39:11.387Z","dependency_job_id":null,"html_url":"https://github.com/Raj-08/Q-Flow","commit_stats":null,"previous_names":["raj-08/q-flow"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Raj-08%2FQ-Flow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Raj-08%2FQ-Flow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Raj-08%2FQ-Flow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Raj-08%2FQ-Flow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Raj-08","download_url":"https://codeload.github.com/Raj-08/Q-Flow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248602310,"owners_count":21131615,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-12T17:02:15.555Z","updated_at":"2025-04-12T17:06:22.962Z","avatar_url":"https://github.com/Raj-08.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"# Q-FLOW\n![Alt Text](images/img-copy.jpg)\n\n\nWelcome to **Q-Flow**, we focus on advancing open source development on Reinforcement Learning (RL) for LLMs. At **QFlow**, we provide the complete toolbox that specifically addresses the Reinforcement Learning Needs of Large Language Models. - Reasoning , Alignment and More !\n\n### Highlights - Aha Moment with Limited Compute \n![Alt Text](images/aha.png)\n\nOur algorithm Reinforce-Lite was able to achieve Aha Moment on Grade School Math Dataset.\nhttps://medium.com/@rjusnba/overnight-end-to-end-rl-training-a-3b-model-on-a-grade-school-math-dataset-leads-to-reasoning-df61410c04c6\n\nTo get started with **QFlow**, simply clone the repository and install the dependencies:\n\n```bash\ngit clone https://github.com/Raj-08/Q-Flow.git\ncd Q-Flow\npip install -r requirements.txt\n```\n\n## Features\n\n- **Dedicated Toolbox**: A set of tools designed to handle reinforcement learning challenges specific to LLMs.\n- **Creative Solutions**: Breakthrough techniques and methodologies that make training language models faster, smarter, and more efficient.\n- **Scalable Performance**: Optimize LLMs with algorithms like **PPO**, **DPO**, and **GRPO**—designed for the unique needs of the LLM world.\n- **Hyperparameter Search**: We use Evolutionary Algorithms to find the right configuration of hyperparametrs to make our trainings more effective.\n\n## Available RL Algorithms\n\nQFlow supports several powerful RL algorithms that can be used to fine-tune your large language models. Choose the one that fits your training requirements:\n\n- [x] **Reinforce-Lite** (Displays Emergence while being computationally affordable)\n- [x] **Monte-Carlo**  (Simple RL Monte Carlo , expectation over a sample of returns)\n- [x] **Group Relative Policy Optimization (GRPO)** (DeepSeek's RL Algorithm)\n- [ ] **Proximal Policy Optimization (PPO)**\n- [ ] **Direct Preference Optimization (DPO)**\n- [ ] **Actor Critic (A2C)**\n\n## Available Datasets\n\nQFlow has out of the box support for reasoning datasets. We will expand further into process reward datasets. \n\n- [x] **GSM8K** GradeSchoolMath\n- [ ] **Math500**  \n\nQFlow provides a simple command-line interface to train your models using different RL algorithms. Here are some examples:\n\n### Training with Different Algorithms\n\n```bash\n# Train using Reinforce-Lite\npython main.py --algorithm reinforce-lite \\\n               --model_name \"microsoft/Phi-3.5-mini-instruct\" \\\n               --dataset_name \"gsm8k\" \\\n               --batch_size 1 \\\n               --num_steps 5000 \\\n               --learning_rate 1e-6\n\n# Train using GRPO\npython main.py --algorithm grpo \\\n               --model_name \"microsoft/Phi-3.5-mini-instruct\" \\\n               --dataset_name \"gsm8k\" \\\n               --batch_size 1 \\\n               --group_size 10 \\\n               --num_steps 5000 \\\n               --learning_rate 1e-6\n\n# Train using Monte-Carlo\npython main.py --algorithm monte-carlo \\\n               --model_name \"microsoft/Phi-3.5-mini-instruct\" \\\n               --dataset_name \"gsm8k\" \\\n               --batch_size 1 \\\n               --num_steps 5000 \\\n               --learning_rate 1e-6\n```\n\n### Common Command Line Arguments\n\n- `--algorithm`: Choose the RL algorithm (`reinforce-lite`, `grpo`, `monte-carlo`)\n- `--model_name`: Name or path of the pretrained model to fine-tune\n- `--dataset_name`: Name of the dataset to use for training\n- `--batch_size`: Number of samples per training batch\n- `--num_steps`: Total number of training steps\n- `--learning_rate`: Learning rate for optimization\n- `--entropy_coef`: Entropy coefficient for exploration (default: 0.001)\n- `--group_size`: Group size for GRPO algorithm (default: 10)\n\n### Monitoring Training with TensorBoard\n\nQFlow automatically logs training metrics to TensorBoard. To view the training progress:\n![Alt Text](images/screen3.png)\n\n![Alt Text](images/screen4.png)\n1. Start TensorBoard server:\n```bash\ntensorboard --logdir runs/\n```\n\n2. Open your browser and navigate to:\n```\nhttp://localhost:6006\n```\n\nThe TensorBoard interface shows:\n- Training loss curves\n- Policy and entropy loss\n- Average rewards and success rates\n- Response lengths\n- Sample model outputs\n- Training hyperparameters\n\nYou can compare different runs by selecting them in the TensorBoard interface. Each run is tagged with the algorithm name and timestamp for easy identification.\n\n### Checkpoints and Model Saving\n\nModels are automatically saved during training:\n- Regular checkpoints every 100 steps\n- Final model after training completion\n- Training state and hyperparameters\n\nCheckpoints are saved in:\n```\ncheckpoints/{algorithm_name}_{timestamp}/\n```\n\nTo load a saved model for inference or continued training, use the checkpoint path as the `model_name` argument.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRaj-08%2FQ-Flow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRaj-08%2FQ-Flow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRaj-08%2FQ-Flow/lists"}