https://github.com/tensorzero/llmgym
https://github.com/tensorzero/llmgym
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/tensorzero/llmgym
- Owner: tensorzero
- License: apache-2.0
- Created: 2025-01-14T20:09:02.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-04-01T23:19:30.000Z (3 months ago)
- Last Synced: 2025-04-01T23:30:33.996Z (3 months ago)
- Language: Python
- Size: 1.48 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
> [!IMPORTANT]
>
> **This repository is still under active development. Expect breaking changes.**# LLM Gym
LLM Gym is a unified environment interface for developing and benchmarking LLM applications that learn from feedback. Think [gym](https://gymnasium.farama.org/) for LLM agents.
As the space of benchmarks rapidly grows, fair and comprehensive comparisons are getting trickier, so we aim to make that easier for you. The vision is an intuitive interface for a suite of environments you can seamlessly swap out for research and development purposes.
## Installation
Follow these steps to set up the development environment for LLM Gym using uv for virtual environment management and Hatch (with Hatchling) for building and packaging.
### Prerequisites
- Python 3.10 (or a compatible version, e.g., >=3.10, <4.0)
- [uv](https://docs.astral.sh/uv/getting-started/installation/) – an extremely fast Python package manager and virtual environment tool### Steps
#### 1. Clone the Repository
Clone the repository to your local machine:
```bash
git clone [email protected]:tensorzero/gym-scratchpad.git
cd llmgym
```#### 2. Create and Activate a Virtual Environment
Use uv to create a virtual environment. This command will create a new environment (by default in the .venv directory) using Python 3.10:
```bash
uv venv --python 3.10
```
Activate the virtual environment:
```bash
source .venv/bin/activate
```#### 3. Install Project Dependencies
Install the project in editable mode along with its development dependencies:
```bash
uv pip install -e .
```#### 4. Verify the Installation
To ensure everything is set up correctly, you can run the tests or simply import the package in Python.Run tests:
```bash
uv run pytest
```Import the package in Python:
```bash
python
>>> import llmgym
>>> llmgym.__version__
'0.0.0'
```## Setting Environment Variables
To set the `OPENAI_API_KEY` environment variable, run the following command:
```bash
export OPENAI_API_KEY="your_openai_api_key"
```We recommend using [direnv](https://direnv.net/) and creating a local `.envrc` file to manage environment variables. For example, the `.envrc` file might look like this:
```bash
export OPENAI_API_KEY="your_openai_api_key"
```and then run `direnv allow` to load the environment variables.
## Quickstart
Start ipython with async support.
```bash
ipython --async=True
```
Run an episode of the 21-questions environment.
```python
import loggingimport llmgym
from llmgym.logs import get_logger
from llmgym.agents import OpenAIAgentlogger = get_logger("llmgym")
logger.setLevel(logging.INFO)env = llmgym.make("21_questions_v0")
agent = llmgym.agents.OpenAIAgent(
model_name="gpt-4o-mini",
function_configs=env.functions,
tool_configs=env.tools,
)
# Get default horizon
max_steps = env.horizon# Reset the environment
reset_data = await env.reset()
obs = reset_data.observation# Run the episode
for _step in range(max_steps):
# Get action from agent
action = await agent.act(obs)# Step the environment
step_data = await env.step(action)
obs = step_data.observation# Check if the episode is done
done = step_data.terminated or step_data.truncated
if done:
break
env.close()
```This can also be run in the [Quickstart Notebook](examples/quickstart.ipynb).
## Tutorial
For a full tutorial, see the [Tutorial Notebook](examples/tutorial.ipynb).
To see how to run multiple episodes concurrently, see the [Tau Bench](examples/tau_bench.ipynb) or [21 Questions](examples/21_questions.ipynb) notebooks.
For a supervised finetuning example, see the [Supervised Finetuning Notebook](examples/supervised_fine_tuning.ipynb).