{"id":31570435,"url":"https://github.com/axon-rl/gem","last_synced_at":"2025-10-05T12:23:01.523Z","repository":{"id":307724954,"uuid":"991116423","full_name":"axon-rl/gem","owner":"axon-rl","description":"A Gym for Agentic LLMs","archived":false,"fork":false,"pushed_at":"2025-10-04T07:01:44.000Z","size":696,"stargazers_count":176,"open_issues_count":8,"forks_count":10,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-04T09:07:32.265Z","etag":null,"topics":["gym","llm","rl"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/axon-rl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-27T06:38:29.000Z","updated_at":"2025-10-04T07:33:30.000Z","dependencies_parsed_at":"2025-08-20T17:29:37.129Z","dependency_job_id":"af2a1127-c331-4d0e-adc4-c0840aedf0e8","html_url":"https://github.com/axon-rl/gem","commit_stats":null,"previous_names":["axon-rl/gem"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/axon-rl/gem","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axon-rl%2Fgem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axon-rl%2Fgem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axon-rl%2Fgem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axon-rl%2Fgem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/axon-rl","download_url":"https://codeload.github.com/axon-rl/gem/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axon-rl%2Fgem/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278290671,"owners_count":25962595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gym","llm","rl"],"created_at":"2025-10-05T12:23:00.916Z","updated_at":"2025-10-05T12:23:01.518Z","avatar_url":"https://github.com/axon-rl.png","language":"Python","funding_links":[],"categories":["TL;DR — pick the right framework"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# GEM: A Gym for Agentic LLMs\n\n\n[![Notion blog](https://img.shields.io/badge/Notion-000000?style=for-the-badge\u0026logo=notion\u0026logoColor=white)](https://axon-rl.notion.site/gem) \n[![🌐 Axon-RL](https://img.shields.io/badge/-AxonRL%20project-5865F2?style=for-the-badge)](https://axon-rl.github.io/) \n[![Hugging Face Collection](https://img.shields.io/badge/AxonRL-fcd022?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000\u0026labelColor)](https://huggingface.co/axon-rl) \n[![Documentation](https://img.shields.io/badge/Documentation-blue?style=for-the-badge\u0026logo=readthedocs\u0026logoColor=white)](https://axon-rl.github.io/gem/)\n\n\u003cdiv align=\"center\" style=\"font-family: Arial, sans-serif;\"\u003e\n  \u003cp\u003e\n    \u003ca href=\"#links\" style=\"text-decoration: none; font-weight: bold;\"\u003eLinks\u003c/a\u003e •\n    \u003ca href=\"#installation\" style=\"text-decoration: none; font-weight: bold;\"\u003eInstallation\u003c/a\u003e •\n    \u003ca href=\"#interface\" style=\"text-decoration: none; font-weight: bold;\"\u003eInterface\u003c/a\u003e •\n    \u003ca href=\"#integration-examples\" style=\"text-decoration: none; font-weight: bold;\"\u003eIntegration Examples\u003c/a\u003e •\n    \u003ca href=\"#roadmap\" style=\"text-decoration: none; font-weight: bold;\"\u003eRoadmap\u003c/a\u003e •\n    \u003ca href=\"#contributing\" style=\"text-decoration: none; font-weight: bold;\"\u003eContributing\u003c/a\u003e •\n    \u003ca href=\"#acknowledgement\" style=\"text-decoration: none; font-weight: bold;\"\u003eAcknowledgement\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\n\u003c/div\u003e\n\nWe’re entering the era of experience, where LLM training moves beyond static datasets, towards LLM agents learning from experience gathered in complex, expressive environments. As a step towards this we introduce **GEM**, our open-source **G**eneral **E**xperience **M**aker.\n\nLike OpenAI [Gym](https://github.com/openai/gym) for traditional RL, GEM is a dedicated environment simulator for the age of LLMs. GEM offers a diverse range of environments with clean, standardized interfaces, making it easy to integrate with existing RL training frameworks (Oat, Verl, etc.). In addition, GEM features tool integration, flexible and easy-to-modify wrappers, async vectorized environment execution to maximize throughput, multi-environment training, and more … everything you need to make LLM agent RL training simple.\n\n\n## Links\n  * 📜 [Initial Blog](https://axon-rl.notion.site/gem)\n  * 🚀 [Blog release tweet](https://x.com/zzlccc/status/1951358948587741295)\n  * 📄 [Paper](https://arxiv.org/pdf/2510.01051)\n  * 📄 [Documentation](https://axon-rl.github.io/gem/)\n\n## Installation\n\nInstall `GEM` from PyPI:\n\n```bash\npip install -U gem-llm\n```\n\nTo use the `search` tool, run the following to install extra dependencies: \n```bash\npip install -U 'gem-llm[search]'\nconda install -c pytorch -c nvidia faiss-gpu=1.8.0\n```\n\nTo use the `mcp` tool and [MCPMark](https://mcpmark.ai/) environment, run the following to install extra dependencies: \n```bash\npip install -U `gem-llm[mcp]`\n\n# install MCPMark\ngit clone git@github.com:axon-rl/mcpmark.git; cd mcpmark\npip install -e .\nplaywright install # If you'll use browser-based tasks, install Playwright browsers first\n```\n\n## Interface\nGEM's interface closely follows Gym's API. Here's an example using the \"game:GuessTheNumber-v0\" environment: \n\n```python \nimport gem\n\n# List all supported environments\ngem.print_envs()\n\n# Initialize the environment\nenv = gem.make(\"game:GuessTheNumber-v0\")\n\n# Reset the environment to generate the first observation\nobservation, info = env.reset()\n\n# Start the agent-environment loop\nwhile True:\n    action = env.sample_random_action() # insert policy here, e.g.,\n    # (pseudocode) action = llm.generate(observation)\n\n    # apply action and receive next observation, reward\n    # and whether the episode has ended\n    next_observation, reward, terminated, truncated, info = env.step(action)\n    print(\"OBS\", observation)\n    print(\"ACT\", action)\n\n    # update the policy (online) here\n    # e.g., policy = learn(policy, observation, action, reward, info)\n\n    observation = next_observation\n    # Exit when the episode terminates\n    if terminated or truncated:\n        break\n```\n\n### Tool Integration Examples\n\nBelow are examples for enabling tools within environments.\n\n**Example using the Python tool:**\n```python\nfrom transformers import AutoTokenizer\n\nimport gem\nfrom gem.tools.python_code_tool import PythonCodeTool\nfrom gem.tools.tool_env_wrapper import ToolEnvWrapper\nfrom gem.wrappers.wrapper_factory import WRAPPER_FACTORY\n\nenv = gem.make(\"math:GSM8K\")\ntool = PythonCodeTool()\nwrapped_env = ToolEnvWrapper(env, tools=[tool])\nwrapped_env = WRAPPER_FACTORY[\"concat_chat\"](\n    wrapped_env, tokenizer=AutoTokenizer.from_pretrained(\"Qwen/Qwen3-0.6B\")\n)\nobs, info = wrapped_env.reset()\n\n# we ignore the obs and use a dummy action\ndummy_action = \"\u003cthink\u003eLet me compare 9.9 and 9.11 using python.\u003c/think\u003e\u003cpython\u003eprint('9.9 \u003e 9.11?', 9.9 \u003e 9.11)\u003c/python\u003e\"\nobs, reward, terminated, truncated, info = wrapped_env.step(dummy_action)\nprint(obs)\n# continue to sample the next response given the tool results ...\n\nwrapped_env.close()\n```\n\n**Example using the search tool:**\n```python\n# assume you have search server running\n\nenv = gem.make(\"game:GuessTheNumber-v0\", max_turns=2)\ntool = SearchTool(search_url=\"http://localhost:8000/retrieve\", topk=2)\nwrapped_env = ToolEnvWrapper(env, tools=[tool], max_tool_uses=1)\nwrapped_env = WRAPPER_FACTORY['concat_chat'](wrapped_env, tokenizer=AutoTokenizer.from_pretrained(\"Qwen/Qwen3-0.6B\"))\nwrapped_env.reset()\n\ndummy_action = \"\u003cthink\u003eI need to search for Python list comprehension examples\u003c/think\u003e\u003csearch\u003ePython list comprehension examples\u003c/search\u003e\"\nobs, reward, terminated, truncated, info = wrapped_env.step(dummy_action)\nprint(obs)\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to get the complete runnable code\u003c/summary\u003e\n\n```python\nimport subprocess\nimport time\n\nfrom transformers import AutoTokenizer\n\nimport gem\nfrom gem.tools.search_tool import SearchTool\nfrom gem.tools.tool_env_wrapper import ToolEnvWrapper\nfrom gem.wrappers.wrapper_factory import WRAPPER_FACTORY\n\n# start the search server\nserp_api_key = \"add you api key\" # get api at https://serpapi.com/manage-api-key\nserver_process = subprocess.Popen([\n    'python', '-m', 'gem.tools.search_engine.serp_search_server',\n    '--search_url', 'https://serpapi.com/search',\n    '--topk', '2', '--serp_api_key', serp_api_key\n])\ntime.sleep(5)\n\n# interact using search tool\nenv = gem.make(\"game:GuessTheNumber-v0\", max_turns=2)\ntool = SearchTool(search_url=\"http://localhost:8000/retrieve\", topk=2)\nwrapped_env = ToolEnvWrapper(env, tools=[tool], max_tool_uses=1)\nwrapped_env = WRAPPER_FACTORY['concat_chat'](wrapped_env, tokenizer=AutoTokenizer.from_pretrained(\"Qwen/Qwen3-0.6B\"))\nwrapped_env.reset()\n\ndummy_action = \"\u003cthink\u003eI need to search for Python list comprehension examples\u003c/think\u003e\u003csearch\u003ePython list comprehension examples\u003c/search\u003e\"\nobs, reward, terminated, truncated, info = wrapped_env.step(dummy_action)\nprint(obs)\n```\n\u003c/details\u003e\n\n## Integration Examples\n\nWe demonstrate how to leverage existing LLM RL infrastructure to train agents with GEM. First, we show how to train game agents using [Oat](https://github.com/sail-sg/oat). \n\nBefore running the training, ensure you set up the development environment by following the [instructions](https://github.com/axon-rl/gem/tree/main/examples#training-with-oat). \n\nRun the following command to train an agent for the game environment `game:GuessTheNumber-v0`: \n\n```python \npython train.py \\\n    --env_id game:GuessTheNumber-v0 \\\n    --wrappers concat \\\n    --gamma 0.9 \\\n    --norm_adv \\\n    --gpus 8 \\\n    --gradient-checkpointing \\\n    --num_samples 1 \\\n    --rollout_batch_size 128 \\\n    --num_envs 2 \\\n    --rollout_batch_size_per_device 16 \\\n    --pi_buffer_maxlen_per_device 16 \\\n    --pretrain Qwen/Qwen3-1.7B-Base \\\n    --enable_prefix_caching \\\n    --collocate \\\n    --vllm_sleep \\\n    --vllm_gpu_ratio 0.45 \\\n    --rnd-seed \\\n    --learning_rate 0.000001 \\\n    --lr_scheduler constant \\\n    --lr_warmup_ratio 0 \\\n    --num_ppo_epochs 2 \\\n    --train_batch_size 128 \\\n    --train_batch_size_per_device 1 \\\n    --beta 0 \\\n    --max_model_len 12800 \\\n    --generate_max_length 4096 \\\n    --temperature 1.0 \\\n    --top_p 1 \\\n    --eval_steps -1 \\\n    --save_steps -1 \\\n    --eval_temperature 0.6 \\\n    --eval_top_p 0.95 \\\n    --eval_generate_max_length 4096 \\\n    --max_train 65000 \\\n    --max_save_num 30 \\\n    --use-wb \\\n    --wb-run-name oat-qwen3-1.7b-base-game:GuessTheNumber-v0 \\\n    --wb_project gem \\\n    --debug\n```\n\n\nWe also provide sample code for math, code, and general QA in the [examples](https://github.com/axon-rl/gem/tree/main/examples) directory. In addition to Oat integration, you can find examples of RL training with Verl [here](https://github.com/axon-rl/gem/tree/main/examples#training-with-verl). \n\n## Roadmap\n\nAs our next step, we plan to integrate the following environments (among others):\n- [ ] Terminal-Bench\n- [ ] SWE-Gym\n- [ ] Multi-Agent Systems\n- [ ] ...\n\n## Contributing\n\nWe welcome all forms of contribution — from adding new environments to integrating additional training frameworks. We're planning to write a community-driven technical report, and major contributors will be recognized with authorship. Join [discord](https://discord.gg/AfXVkEphzD) to discuss more!\n\n## Acknowledgement\n* This work is supported by [Sea AI Lab](https://sail.sea.com/) for computing resources.\n* Our code learns from and builds on several awesome projects such as [gym](https://github.com/openai/gym), [rllm](https://github.com/rllm-org/rllm), [TextArena](https://github.com/LeonGuertler/TextArena), [Search-R1](https://github.com/PeterGriffinJin/Search-R1), [ReasoningGym](https://github.com/open-thought/reasoning-gym).\n* The training example code is built on [Oat](https://github.com/sail-sg/oat) and [Verl](https://github.com/volcengine/verl).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faxon-rl%2Fgem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faxon-rl%2Fgem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faxon-rl%2Fgem/lists"}