https://github.com/the-swarm-corporation/agentgym

A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1
https://github.com/the-swarm-corporation/agentgym

agents ai alibaba deepseek llms o1 qwen r1 rl

Last synced: 3 months ago
JSON representation

A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1

Host: GitHub
URL: https://github.com/the-swarm-corporation/agentgym
Owner: The-Swarm-Corporation
License: mit
Created: 2025-01-29T16:33:46.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-06-27T12:41:15.000Z (4 months ago)
Last Synced: 2025-06-29T13:44:50.260Z (4 months ago)
Topics: agents, ai, alibaba, deepseek, llms, o1, qwen, r1, rl
Language: Python
Homepage: https://swarms.ai
Size: 2.39 MB
Stars: 21
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          # Agent Gym

![Agent Gym](images/steps.png)

[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/swarms) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)

Convert any model into a r1-like reasoning hyper-intelligent agent. Leverages TRL, Huggingface, and various other libraries. This is a work in progress. Our goal is to make it easy to train any model into a reasoning agent.

- Sources:

- [Open R1 Blog](https://huggingface.co/blog/open-r1)

- [GRPO Documentation from trl](https://huggingface.co/docs/trl/main/en/grpo_trainer)

- [Huggingface Docs](https://huggingface.co/docs/transformers/main/en/index)

- [GRPO Docs](https://huggingface.co/docs/trl/main/en/grpo_trainer)

## Installation

```bash

pip3 install -U agentgym

```

## Usage

```python

from agentgym.r1_pipeline import R1Pipeline, SFTConfig

r1_pipeline = R1Pipeline(

    sft_model="Qwen/Qwen2-0.5B-Instruct",

    tokenizer_name="Qwen/Qwen2-0.5B-Instruct",

    sft_dataset="trl-lib/tldr",

    sft_args=SFTConfig(output_dir="/tmp"),

    only_grpo=True,

    model_name="Qwen/Qwen2-0.5B-Instruct"

)

r1_pipeline.run()

```

## Architecture

The architecture is as follows:

- SFT: Supervised Fine-Tuning

- GRPO: Generative Reinforcement Policy Optimization

-> model -> sft -> grpo -> model

```mermaid

graph TD;

    A[model] --> B[sft]

    B --> C[grpo]

    C --> D[reasoning model]

```

# License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/the-swarm-corporation/agentgym

Awesome Lists containing this project

README