https://github.com/hud-evals/hud-python
OSS RL environment + evals toolkit
https://github.com/hud-evals/hud-python
grpo llm llms lora qwen qwen3 reinforcement-learning reinforcement-learning-environments rl
Last synced: 2 months ago
JSON representation
OSS RL environment + evals toolkit
- Host: GitHub
- URL: https://github.com/hud-evals/hud-python
- Owner: hud-evals
- License: mit
- Created: 2025-03-02T04:05:49.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-04-01T03:37:42.000Z (3 months ago)
- Last Synced: 2026-04-01T04:26:27.593Z (3 months ago)
- Topics: grpo, llm, llms, lora, qwen, qwen3, reinforcement-learning, reinforcement-learning-environments, rl
- Language: Python
- Homepage: https://www.hud.ai
- Size: 62 MB
- Stars: 319
- Watchers: 3
- Forks: 54
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
HUD is a platform for building RL environments for AI agents. Define agent-callable tools, write evaluation scenarios, run evals at scale, and train models on the results.
To learn more, check out our [Documentation](https://docs.hud.ai) and [API Reference](https://docs.hud.ai/reference).
[](https://pypi.org/project/hud-python/)
[](LICENSE)
[](https://cursor.com/en/install-mcp?name=docs-hud-python&config=eyJ1cmwiOiJodHRwczovL2RvY3MuaHVkLmFpL21jcCJ9)
[](https://discord.gg/wkjtmHYYjm)
[](https://x.com/intent/user?screen_name=hud_evals)
[](https://scarf.sh)
[](https://docs.hud.ai)
## Install
```bash
# Install CLI (recommended)
uv tool install hud-python --python 3.12
Get your API key at [hud.ai](https://hud.ai) and set it:
```bash
export HUD_API_KEY=your-key-here
```
Get your API key at [hud.ai/project/api-keys](https://hud.ai/project/api-keys).
> Or install as a library: `pip install hud-python`

## Environments
An environment is the harness an agent operates in. It packages tools (functions agents can call) and scenarios (how agents are evaluated) into a single deployable unit. Each environment spins up fresh and isolated for every evaluation.
```python
from hud import Environment
env = Environment("my-env")
@env.scenario("count")
async def count(word: str, letter: str):
# PROMPT — send a question to the agent.
# The agent runs its reasoning loop and returns an answer.
answer = yield f"How many '{letter}' in '{word}'?"
# SCORE — check the agent's answer against the correct count.
# Return a reward: 1.0 for correct, 0.0 for wrong.
correct = str(word.lower().count(letter.lower()))
yield 1.0 if answer and correct in answer else 0.0
```
A scenario has two yields. The first sends a prompt — the agent runs between the yields, calling tools and reasoning. The second checks the result and returns a reward (0.0 to 1.0). → [Core Concepts](https://docs.hud.ai/concepts)
## Run an Agent
```python
import hud
from hud.agents import create_agent
task = env("count", word="strawberry", letter="r")
agent = create_agent("claude-sonnet-4-5")
async with hud.eval(task) as ctx:
result = await agent.run(ctx)
print(f"Reward: {result.reward}") # 1.0 if agent answers "3"
```
`create_agent()` picks the right agent class and native tools for each model. → [Environments](https://docs.hud.ai/quick-links/environments)
## Workflow
```bash
hud init my-env # Scaffold environment
cd my-env
hud dev env:env -w env.py # Run locally with hot-reload
hud eval tasks.py claude # Run evals locally
hud deploy # Deploy to platform
hud sync tasks my-taskset # Sync tasks to platform
```
Once deployed, run evals at scale from the CLI or the [platform UI](https://hud.ai):
```bash
hud eval my-taskset claude --remote --full
```
→ [Deploy](https://docs.hud.ai/quick-links/deploy) · [Testing & Evaluation](https://docs.hud.ai/advanced/testing-environments)
## Pre-built Tools
HUD ships tools for computer control, shell execution, file editing, browser automation, and web search. Add them to any environment:
```python
from hud.tools import AnthropicComputerTool, BashTool, EditTool
env.add_tool(AnthropicComputerTool()) # Mouse, keyboard, screenshots
env.add_tool(BashTool()) # Persistent bash shell
env.add_tool(EditTool()) # File viewing and editing
```
HUD adapts each tool to the model's native format — Claude gets `computer_20250124`, OpenAI gets `computer_use_preview`, Gemini gets `ComputerUse`. → [Tools Reference](https://docs.hud.ai/tools/computer)
## Model Gateway
Use Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint:
```python
from openai import AsyncOpenAI
import os
client = AsyncOpenAI(
base_url="https://inference.hud.ai",
api_key=os.environ["HUD_API_KEY"]
)
response = await client.chat.completions.create(
model="claude-sonnet-4-5", # or gpt-4o, gemini-2.5-pro (https://hud.ai/models)
messages=[{"role": "user", "content": "Hello!"}]
)
```
Every call is traced at [hud.ai](https://hud.ai). → [Models](https://docs.hud.ai/quick-links/models)
## Links
- 📖 [Documentation](https://docs.hud.ai)
- ⌨️ [CLI Reference](https://docs.hud.ai/reference/cli/overview)
- 🏆 [Leaderboards](https://hud.ai/leaderboards)
- 🌐 [Environment Templates](https://hud.ai/environments)
- 🤖 [Supported Models](https://hud.ai/models)
- 💬 [Discord](https://discord.gg/wkjtmHYYjm)
## Enterprise
Building agents at scale? We work with teams on custom environments, benchmarks, and training.
[📅 Book a call](https://cal.com/jay-hud) · [📧 founders@hud.ai](mailto:founders@hud.ai)
## Contributing
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md).
Key areas: [Agents](hud/agents/) · [Tools](hud/tools/) · [Environments](https://hud.ai/environments)
## Citation
```bibtex
@software{hud2025agentevalplatform,
author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Govind Pimpale and Dylan Bowman and Jaideep and Nguyen Nhat Minh},
title = {HUD: An Evaluation and RL Envrionments Platform for Agents},
date = {2025-04},
url = {https://github.com/hud-evals/hud-python},
langid = {en}
}
```
MIT License · [LICENSE](LICENSE)