https://github.com/axon-rl/gem

A Gym for Agentic LLMs
https://github.com/axon-rl/gem
gym llm rl
Last synced: 9 months ago
JSON representation
A Gym for Agentic LLMs
Host: GitHub
URL: https://github.com/axon-rl/gem
Owner: axon-rl
License: apache-2.0
Created: 2025-05-27T06:38:29.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-10-04T07:01:44.000Z (9 months ago)
Last Synced: 2025-10-04T09:07:32.265Z (9 months ago)
Topics: gym, llm, rl
Language: Python
Homepage:
Size: 680 KB
Stars: 176
Watchers: 1
Forks: 10
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project

awesome-agent-rl-environments - axon-rl/gem
README

          


# GEM: A Gym for Agentic LLMs

[![Notion blog](https://img.shields.io/badge/Notion-000000?style=for-the-badge&logo=notion&logoColor=white)](https://axon-rl.notion.site/gem) 

[![🌐 Axon-RL](https://img.shields.io/badge/-AxonRL%20project-5865F2?style=for-the-badge)](https://axon-rl.github.io/) 

[![Hugging Face Collection](https://img.shields.io/badge/AxonRL-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/axon-rl) 

[![Documentation](https://img.shields.io/badge/Documentation-blue?style=for-the-badge&logo=readthedocs&logoColor=white)](https://axon-rl.github.io/gem/)



  

    Links •

    Installation •

    Interface •

    Integration Examples •

    Roadmap •

    Contributing •

    Acknowledgement

  





We’re entering the era of experience, where LLM training moves beyond static datasets, towards LLM agents learning from experience gathered in complex, expressive environments. As a step towards this we introduce **GEM**, our open-source **G**eneral **E**xperience **M**aker.

Like OpenAI [Gym](https://github.com/openai/gym) for traditional RL, GEM is a dedicated environment simulator for the age of LLMs. GEM offers a diverse range of environments with clean, standardized interfaces, making it easy to integrate with existing RL training frameworks (Oat, Verl, etc.). In addition, GEM features tool integration, flexible and easy-to-modify wrappers, async vectorized environment execution to maximize throughput, multi-environment training, and more … everything you need to make LLM agent RL training simple.

## Links

  * 📜 [Initial Blog](https://axon-rl.notion.site/gem)

  * 🚀 [Blog release tweet](https://x.com/zzlccc/status/1951358948587741295)

  * 📄 [Paper](https://arxiv.org/pdf/2510.01051)

  * 📄 [Documentation](https://axon-rl.github.io/gem/)

## Installation

Install `GEM` from PyPI:

```bash

pip install -U gem-llm

```

To use the `search` tool, run the following to install extra dependencies: 

```bash

pip install -U 'gem-llm[search]'

conda install -c pytorch -c nvidia faiss-gpu=1.8.0

```

To use the `mcp` tool and [MCPMark](https://mcpmark.ai/) environment, run the following to install extra dependencies: 

```bash

pip install -U `gem-llm[mcp]`

# install MCPMark

git clone git@github.com:axon-rl/mcpmark.git; cd mcpmark

pip install -e .

playwright install # If you'll use browser-based tasks, install Playwright browsers first

```

## Interface

GEM's interface closely follows Gym's API. Here's an example using the "game:GuessTheNumber-v0" environment: 

```python 

import gem

# List all supported environments

gem.print_envs()

# Initialize the environment

env = gem.make("game:GuessTheNumber-v0")

# Reset the environment to generate the first observation

observation, info = env.reset()

# Start the agent-environment loop

while True:

    action = env.sample_random_action() # insert policy here, e.g.,

    # (pseudocode) action = llm.generate(observation)

    # apply action and receive next observation, reward

    # and whether the episode has ended

    next_observation, reward, terminated, truncated, info = env.step(action)

    print("OBS", observation)

    print("ACT", action)

    # update the policy (online) here

    # e.g., policy = learn(policy, observation, action, reward, info)

    observation = next_observation

    # Exit when the episode terminates

    if terminated or truncated:

        break

```

### Tool Integration Examples

Below are examples for enabling tools within environments.

**Example using the Python tool:**

```python

from transformers import AutoTokenizer

import gem

from gem.tools.python_code_tool import PythonCodeTool

from gem.tools.tool_env_wrapper import ToolEnvWrapper

from gem.wrappers.wrapper_factory import WRAPPER_FACTORY

env = gem.make("math:GSM8K")

tool = PythonCodeTool()

wrapped_env = ToolEnvWrapper(env, tools=[tool])

wrapped_env = WRAPPER_FACTORY["concat_chat"](

    wrapped_env, tokenizer=AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

)

obs, info = wrapped_env.reset()

# we ignore the obs and use a dummy action

dummy_action = "Let me compare 9.9 and 9.11 using python.print('9.9 > 9.11?', 9.9 > 9.11)"

obs, reward, terminated, truncated, info = wrapped_env.step(dummy_action)

print(obs)

# continue to sample the next response given the tool results ...

wrapped_env.close()

```

**Example using the search tool:**

```python

# assume you have search server running

env = gem.make("game:GuessTheNumber-v0", max_turns=2)

tool = SearchTool(search_url="http://localhost:8000/retrieve", topk=2)

wrapped_env = ToolEnvWrapper(env, tools=[tool], max_tool_uses=1)

wrapped_env = WRAPPER_FACTORY['concat_chat'](wrapped_env, tokenizer=AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B"))

wrapped_env.reset()

dummy_action = "I need to search for Python list comprehension examplesPython list comprehension examples"

obs, reward, terminated, truncated, info = wrapped_env.step(dummy_action)

print(obs)

```

Click to get the complete runnable code

```python

import subprocess

import time

from transformers import AutoTokenizer

import gem

from gem.tools.search_tool import SearchTool

from gem.tools.tool_env_wrapper import ToolEnvWrapper

from gem.wrappers.wrapper_factory import WRAPPER_FACTORY

# start the search server

serp_api_key = "add you api key" # get api at https://serpapi.com/manage-api-key

server_process = subprocess.Popen([

    'python', '-m', 'gem.tools.search_engine.serp_search_server',

    '--search_url', 'https://serpapi.com/search',

    '--topk', '2', '--serp_api_key', serp_api_key

])

time.sleep(5)

# interact using search tool

env = gem.make("game:GuessTheNumber-v0", max_turns=2)

tool = SearchTool(search_url="http://localhost:8000/retrieve", topk=2)

wrapped_env = ToolEnvWrapper(env, tools=[tool], max_tool_uses=1)

wrapped_env = WRAPPER_FACTORY['concat_chat'](wrapped_env, tokenizer=AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B"))

wrapped_env.reset()

dummy_action = "I need to search for Python list comprehension examplesPython list comprehension examples"

obs, reward, terminated, truncated, info = wrapped_env.step(dummy_action)

print(obs)

```

## Integration Examples

We demonstrate how to leverage existing LLM RL infrastructure to train agents with GEM. First, we show how to train game agents using [Oat](https://github.com/sail-sg/oat). 

Before running the training, ensure you set up the development environment by following the [instructions](https://github.com/axon-rl/gem/tree/main/examples#training-with-oat). 

Run the following command to train an agent for the game environment `game:GuessTheNumber-v0`: 

```python 

python train.py \

    --env_id game:GuessTheNumber-v0 \

    --wrappers concat \

    --gamma 0.9 \

    --norm_adv \

    --gpus 8 \

    --gradient-checkpointing \

    --num_samples 1 \

    --rollout_batch_size 128 \

    --num_envs 2 \

    --rollout_batch_size_per_device 16 \

    --pi_buffer_maxlen_per_device 16 \

    --pretrain Qwen/Qwen3-1.7B-Base \

    --enable_prefix_caching \

    --collocate \

    --vllm_sleep \

    --vllm_gpu_ratio 0.45 \

    --rnd-seed \

    --learning_rate 0.000001 \

    --lr_scheduler constant \

    --lr_warmup_ratio 0 \

    --num_ppo_epochs 2 \

    --train_batch_size 128 \

    --train_batch_size_per_device 1 \

    --beta 0 \

    --max_model_len 12800 \

    --generate_max_length 4096 \

    --temperature 1.0 \

    --top_p 1 \

    --eval_steps -1 \

    --save_steps -1 \

    --eval_temperature 0.6 \

    --eval_top_p 0.95 \

    --eval_generate_max_length 4096 \

    --max_train 65000 \

    --max_save_num 30 \

    --use-wb \

    --wb-run-name oat-qwen3-1.7b-base-game:GuessTheNumber-v0 \

    --wb_project gem \

    --debug

```

We also provide sample code for math, code, and general QA in the [examples](https://github.com/axon-rl/gem/tree/main/examples) directory. In addition to Oat integration, you can find examples of RL training with Verl [here](https://github.com/axon-rl/gem/tree/main/examples#training-with-verl). 

## Roadmap

As our next step, we plan to integrate the following environments (among others):

- [ ] Terminal-Bench

- [ ] SWE-Gym

- [ ] Multi-Agent Systems

- [ ] ...

## Contributing

We welcome all forms of contribution — from adding new environments to integrating additional training frameworks. We're planning to write a community-driven technical report, and major contributors will be recognized with authorship. Join [discord](https://discord.gg/AfXVkEphzD) to discuss more!

## Acknowledgement

* This work is supported by [Sea AI Lab](https://sail.sea.com/) for computing resources.

* Our code learns from and builds on several awesome projects such as [gym](https://github.com/openai/gym), [rllm](https://github.com/rllm-org/rllm), [TextArena](https://github.com/LeonGuertler/TextArena), [Search-R1](https://github.com/PeterGriffinJin/Search-R1), [ReasoningGym](https://github.com/open-thought/reasoning-gym).

* The training example code is built on [Oat](https://github.com/sail-sg/oat) and [Verl](https://github.com/volcengine/verl).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/axon-rl/gem

Awesome Lists containing this project

README