{"id":50744123,"url":"https://github.com/ctrl-gaurav/debate-train-evolve","last_synced_at":"2026-06-10T19:01:16.588Z","repository":{"id":346509301,"uuid":"1059045560","full_name":"ctrl-gaurav/Debate-Train-Evolve","owner":"ctrl-gaurav","description":"[EMNLP 2025 Main] DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning","archived":false,"fork":false,"pushed_at":"2026-03-24T06:33:02.000Z","size":205,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-25T08:08:31.608Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://aclanthology.org/2025.emnlp-main.1666/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ctrl-gaurav.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-17T23:20:14.000Z","updated_at":"2026-03-24T06:33:06.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ctrl-gaurav/Debate-Train-Evolve","commit_stats":null,"previous_names":["ctrl-gaurav/debate-train-evolve"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ctrl-gaurav/Debate-Train-Evolve","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctrl-gaurav%2FDebate-Train-Evolve","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctrl-gaurav%2FDebate-Train-Evolve/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctrl-gaurav%2FDebate-Train-Evolve/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctrl-gaurav%2FDebate-Train-Evolve/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ctrl-gaurav","download_url":"https://codeload.github.com/ctrl-gaurav/Debate-Train-Evolve/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctrl-gaurav%2FDebate-Train-Evolve/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34165482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-10T19:01:15.020Z","updated_at":"2026-06-10T19:01:16.573Z","avatar_url":"https://github.com/ctrl-gaurav.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Debate, Train, Evolve\n\n\u003cdiv align=\"center\"\u003e\n\n[![EMNLP 2025](https://img.shields.io/badge/EMNLP_2025-Main_Conference-brightgreen?style=for-the-badge)](https://aclanthology.org/2025.emnlp-main.1666/)\n[![Paper](https://img.shields.io/badge/Paper-ACL_Anthology-blue?style=for-the-badge)](https://aclanthology.org/2025.emnlp-main.1666/)\n[![Website](https://img.shields.io/badge/Website-Live-orange?style=for-the-badge)](https://ctrl-gaurav.github.io/debate-train-evolve.github.io/)\n[![Python](https://img.shields.io/badge/Python-3.9--3.13-blue?style=for-the-badge\u0026logo=python)](https://python.org)\n[![License](https://img.shields.io/badge/License-MIT-green?style=for-the-badge)](LICENSE)\n\n**Self-Evolution of Language Model Reasoning via Multi-Agent Debate Traces**\n\n**[Gaurav Srivastava](mailto:gks@vt.edu)**\\* \u0026nbsp;\u0026bull;\u0026nbsp; **[Zhenyu Bi](mailto:zhenyub@vt.edu)** \u0026nbsp;\u0026bull;\u0026nbsp; **[Meng Lu](mailto:menglu@vt.edu)** \u0026nbsp;\u0026bull;\u0026nbsp; **[Xuan Wang](mailto:xuanw@vt.edu)**\u0026dagger;\n\n[![Virginia Tech](https://img.shields.io/badge/Virginia_Tech-CS_Department-861F41?style=flat-square\u0026logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjQiIGhlaWdodD0iMjQiIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPHBhdGggZD0iTTEyIDJMMTMuMDkgOC4yNkwyMCA5TDEzLjA5IDE1Ljc0TDEyIDIyTDEwLjkxIDE1Ljc0TDQgOUwxMC45MSA4LjI2TDEyIDJaIiBmaWxsPSJjdXJyZW50Q29sb3IiLz4KPC9zdmc+)](https://cs.vt.edu/)\n\u0026nbsp;\n[![EMNLP 2025](https://img.shields.io/badge/EMNLP-2025-2a4dff?style=flat-square\u0026logo=academia)](https://2025.emnlp.org/)\n\u0026nbsp;\n[![ACL Anthology](https://img.shields.io/badge/ACL_Anthology-2025.emnlp--main.1666-red?style=flat-square)](https://aclanthology.org/2025.emnlp-main.1666/)\n\n\u003csub\u003e\\* Lead Author \u0026nbsp;\u0026nbsp; \u0026dagger; Corresponding Author\u003c/sub\u003e\n\n[**Read the Paper**](https://aclanthology.org/2025.emnlp-main.1666/) \u0026nbsp;|\u0026nbsp; [**Website \u0026 Docs**](https://ctrl-gaurav.github.io/debate-train-evolve.github.io/) \u0026nbsp;|\u0026nbsp; [**GitHub**](https://github.com/ctrl-gaurav/Debate-Train-Evolve)\n\n\u003c/div\u003e\n\n---\n\n## Overview\n\n**DTE (Debate, Train, Evolve)** is a ground-truth-free training framework that evolves language model reasoning through multi-agent debate traces. Multiple LLM copies debate using Reflect-Critique-Refine (RCR) prompting, generating high-quality training data without external supervision. The model is then fine-tuned via Group Relative Policy Optimization (GRPO) and the process repeats.\n\n**Key results:**\n- Up to **+13.92%** accuracy gain (Qwen-1.5B on GSM-Plus)\n- **+5.8%** average cross-domain generalization to science tasks\n- Reduces sycophancy by **50%** via RCR prompting\n- Single-model inference after training (no multi-agent overhead)\n\n## Performance\n\n| Model | GSM8K | GSM-Plus | MATH | ARC-Challenge | Best Gain |\n|-------|-------|----------|------|---------------|-----------|\n| Qwen-2.5-1.5B | 62.77 \u0026rarr; **73.09** | 42.00 \u0026rarr; **55.92** | 45.08 \u0026rarr; **52.20** | 69.21 \u0026rarr; 68.36 | **+13.92%** |\n| Qwen-2.5-3B | 84.08 \u0026rarr; **86.05** | 61.75 \u0026rarr; **69.50** | 61.36 \u0026rarr; **67.10** | 83.53 \u0026rarr; **83.95** | **+7.75%** |\n| Qwen-2.5-7B | 90.67 \u0026rarr; 88.32 | 68.62 \u0026rarr; **74.71** | 73.08 \u0026rarr; **77.20** | 87.22 \u0026rarr; **90.89** | **+6.09%** |\n| Qwen-2.5-14B | 92.80 \u0026rarr; **93.74** | 71.79 \u0026rarr; **78.88** | 76.18 \u0026rarr; **80.10** | 90.27 \u0026rarr; **93.13** | **+7.09%** |\n| Llama-3.2-3B | 72.55 \u0026rarr; **75.06** | 45.67 \u0026rarr; **53.79** | 39.76 \u0026rarr; **43.80** | 73.12 \u0026rarr; **77.23** | **+8.12%** |\n| Llama-3.1-8B | 81.73 \u0026rarr; **86.81** | 55.62 \u0026rarr; **66.17** | 46.66 \u0026rarr; **49.40** | 77.65 \u0026rarr; **86.53** | **+10.55%** |\n\n*Values show Base \u0026rarr; Evolved performance. Bold = improvement.*\n\n## Installation\n\n**Prerequisites:** Python 3.9+ and a CUDA GPU (for training). Debate-only mode works on CPU.\n\n```bash\n# Quick setup (conda)\ngit clone https://github.com/ctrl-gaurav/Debate-Train-Evolve.git\ncd Debate-Train-Evolve\nbash setup.sh\n\n# Or manual install\npython -m venv dte_env \u0026\u0026 source dte_env/bin/activate\npip install -r requirements.txt\npip install -e .\n\n# Verify\npython main.py info\n```\n\n## Quick Start\n\n### Python API\n\n```python\nimport dte\n\n# One-liner debate\nresult = dte.debate(\n    \"What is 15 * 24?\",\n    model=\"Qwen/Qwen2.5-0.5B-Instruct\",\n    num_agents=3,\n    max_rounds=3,\n    task_type=\"math\",\n)\nprint(result.final_answer)       # \"360\"\nprint(result.consensus_reached)  # True\n```\n\n### CLI\n\n```bash\n# Single query debate\npython main.py debate --query \"What is 15 * 24?\" --agents 3 --rounds 3\n\n# Dataset evaluation\npython main.py debate --dataset gsm8k --samples 20 --verbose\n\n# Full pipeline (debate -\u003e train -\u003e evolve)\npython main.py run --config config.yaml\n```\n\n### Full Pipeline\n\n```python\nimport dte\n\npipeline = dte.from_config(\"config.yaml\")\nresults = pipeline.run_complete_pipeline()\nprint(f\"Improvement: {results['total_improvement']:.2%}\")\n```\n\n## Project Structure\n\n```\nDebate-Train-Evolve/\n├── dte/                        # Main package\n│   ├── __init__.py             # Public API: dte.debate(), dte.from_config()\n│   ├── core/                   # Config, pipeline, evaluator, logger\n│   ├── debate/                 # Multi-agent debate (agent, manager, prompts)\n│   ├── training/               # GRPO trainer + reward model\n│   ├── data/                   # Dataset management + data generation\n│   └── utils/                  # Answer extraction, helpers\n├── examples/                   # 6 usage examples\n├── tests/                      # Unit + GPU integration tests\n├── config.yaml                 # Default configuration\n├── main.py                     # CLI entry point\n└── pyproject.toml              # Package metadata\n```\n\n## Documentation\n\nFull documentation is available on the [project website](https://ctrl-gaurav.github.io/debate-train-evolve.github.io/#/docs), including:\n\n- **Installation \u0026 Setup** -- prerequisites, GPU support, development setup\n- **Quick Start** -- Python API, CLI, component-level usage\n- **API Reference** -- all public classes and functions\n- **Configuration** -- complete YAML config reference\n- **Training Guide** -- GRPO hyperparameters, multi-GPU, expected training times\n- **Reward Functions** -- the 5 shaped reward functions (total max: 4.0)\n- **Dataset Reference** -- 7 benchmarks (GSM8K, GSM-Plus, MATH, ARC, GPQA, CommonsenseQA)\n- **CLI Reference** -- all commands and flags\n- **Troubleshooting** -- OOM, model loading, consensus issues\n- **FAQ** -- common questions answered\n\n## CLI Commands\n\n| Command | Description |\n|---------|-------------|\n| `python main.py run` | Run the complete DTE pipeline |\n| `python main.py debate` | Standalone multi-agent debate |\n| `python main.py generate` | Generate training data from debates |\n| `python main.py train` | Train model with GRPO |\n| `python main.py validate` | Validate a configuration file |\n| `python main.py init` | Generate default config |\n| `python main.py info` | Show system \u0026 GPU information |\n\n## Contributing\n\n```bash\npip install -e \".[dev]\"\n\n# Tests\npytest -m \"not gpu\" -v              # Unit tests (no GPU)\npytest tests/test_debate_integration.py -v  # GPU tests\n\n# Lint \u0026 format\nruff check dte/ tests/\nruff format dte/ tests/\n```\n\n## Acknowledgments\n\nThis work was supported by NSF NAIRR Pilot with PSC Neocortex and NCSA Delta; Amazon, Cisco Research, Commonwealth Cyber Initiative, Amazon-Virginia Tech Center for Efficient and Robust Machine Learning, and the Sanghani Center for AI and Data Analytics at Virginia Tech.\n\n## Citation\n\n```bibtex\n@inproceedings{srivastava2025debate,\n  title={Debate, Train, Evolve: Self-Evolution of Language Model Reasoning},\n  author={Srivastava, Gaurav and Bi, Zhenyu and Lu, Meng and Wang, Xuan},\n  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},\n  year={2025},\n  url={https://aclanthology.org/2025.emnlp-main.1666/}\n}\n```\n\n## License\n\nMIT License. See [LICENSE](LICENSE) for details.\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n[**Read the Paper**](https://aclanthology.org/2025.emnlp-main.1666/) \u0026nbsp;|\u0026nbsp; [**Website \u0026 Docs**](https://ctrl-gaurav.github.io/debate-train-evolve.github.io/) \u0026nbsp;|\u0026nbsp; [**GitHub**](https://github.com/ctrl-gaurav/Debate-Train-Evolve)\n\nMade with \u0026#10084;\u0026#65039; by the DTE Research Team\n\n[![Virginia Tech](https://img.shields.io/badge/Virginia_Tech-CS_Department-861F41?style=flat\u0026logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjQiIGhlaWdodD0iMjQiIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPHBhdGggZD0iTTEyIDJMMTMuMDkgOC4yNkwyMCA5TDEzLjA5IDE1Ljc0TDEyIDIyTDEwLjkxIDE1Ljc0TDQgOUwxMC45MSA4LjI2TDEyIDJaIiBmaWxsPSJjdXJyZW50Q29sb3IiLz4KPC9zdmc+)](https://cs.vt.edu/)\n[![EMNLP 2025](https://img.shields.io/badge/EMNLP-2025-2a4dff?style=flat\u0026logo=academia)](https://2025.emnlp.org/)\n[![ACL Anthology](https://img.shields.io/badge/ACL_Anthology-2025.emnlp--main.1666-red?style=flat)](https://aclanthology.org/2025.emnlp-main.1666/)\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctrl-gaurav%2Fdebate-train-evolve","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fctrl-gaurav%2Fdebate-train-evolve","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctrl-gaurav%2Fdebate-train-evolve/lists"}