https://github.com/svilupp/layercode-gym

Unofficial utilities for Layercode Voice Agents. Run hundreds of voice AI conversations concurrently. Test with text, audio files, or AI-driven personas.
https://github.com/svilupp/layercode-gym
evals generative-ai layercode voice-ai-agents
Last synced: 3 months ago
JSON representation
Unofficial utilities for Layercode Voice Agents. Run hundreds of voice AI conversations concurrently. Test with text, audio files, or AI-driven personas.
Host: GitHub
URL: https://github.com/svilupp/layercode-gym
Owner: svilupp
License: mit
Created: 2025-11-02T07:52:03.000Z (7 months ago)
Default Branch: main
Last Pushed: 2026-02-02T21:09:09.000Z (4 months ago)
Last Synced: 2026-02-03T10:34:23.642Z (4 months ago)
Topics: evals, generative-ai, layercode, voice-ai-agents
Language: Python
Homepage: http://siml.earth/layercode-gym/
Size: 521 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Roadmap: docs/roadmap.md
Awesome Lists containing this project

README

          # LayerCode Gym

[![CI](https://github.com/svilupp/layercode-gym/actions/workflows/ci.yml/badge.svg)](https://github.com/svilupp/layercode-gym/actions/workflows/ci.yml)

[![Docs](https://github.com/svilupp/layercode-gym/actions/workflows/docs.yml/badge.svg)](https://github.com/svilupp/layercode-gym/actions/workflows/docs.yml)

[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://svilupp.github.io/layercode-gym)

A testing toolkit for voice AI agents built on [Layercode.com](https://layercode.com). Quickly spin up a testing environment to run through hundreds of scenarios and understand how your agent will perform in production.

**Note:** This is an unofficial, community-maintained project.

Perfect for regression testing, load testing, and automated evaluation of your voice AI agents.

## Features

- **Three User Simulator Types**: Fixed text, pre-recorded audio, or AI-driven personas

- **Smart Wait Handling**: AI personas intelligently wait when assistants need processing time

- **Captured Analytics**: Full transcripts with TTFAB, latency stats, and audio recordings

- **LogFire Integration**: Real-time observability and debugging

- **Batch Testing**: Run hundreds of conversations concurrently

- **CLI & Python API**: Quick testing via CLI or programmatic control, plus `api-agents` CLI to swap webhook URLs for CI

- **LLM-as-Judge**: Bring your own quality evaluation with customizable criteria as a conversational hook

- **GitHub Actions Integration**: Automated CI/CD testing with parallel persona execution

See `examples/` for reference!

## Quick Start

**Prerequisites:** Backend server configured in [Layercode dashboard](https://dash.layercode.com).

No server yet? Launch one quickly:

```bash

uvx layercode-create-app run --tunnel --unsafe-update-webhook

# Displays tunnel URL to enter in Layercode dashboard

```

!! Caution: `--unsafe-update-webhook` automatically updates the webhook URL in the Layercode dashboard!

### CLI Quick Test (No Installation)

```bash

# Set environment

export SERVER_URL="http://localhost:8001"

export LAYERCODE_AGENT_ID="your_agent_id"

# Run instantly with uvx (no installation)

uvx layercode-gym run --text "Hello, I need help with my account"

# Multiple messages

uvx layercode-gym run --text "Hi" --text "Tell me more" --text "Goodbye"

# Audio file

uvx layercode-gym run --file recording.wav

# AI agent with persona

uvx layercode-gym run --agent \

  --persona-background "You are a frustrated customer" \

  --persona-intent "Cancel your subscription"

```

Run `uvx layercode-gym --help` to see available commands, or `uvx layercode-gym run --help` for all run options.

### Manage Agent Webhooks (for CI)

```bash

# List all agents

uvx layercode-gym api-agents list

# Get agent details (use --json for full pipeline config)

uvx layercode-gym api-agents get --agent-id ag-123

# Update webhook URL (useful for PR testing)

uvx layercode-gym api-agents update --agent-id ag-123 --webhook-url https://pr-backend.com/webhook

```

### Cloudflare Tunnel (for Local Development)

Quickly expose your local server to the internet with a Cloudflare tunnel. This is useful for testing webhooks without deploying your backend.

**Requires:** [cloudflared](https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup/installation/) to be installed.

```bash

# Basic tunnel - displays URL to copy manually

uvx layercode-gym tunnel --port 8000

# Or specify a full URL directly

uvx layercode-gym tunnel --url http://localhost:8000

# Auto-update agent webhook (recommended for development)

uvx layercode-gym tunnel --port 8000 --unsafe-update-webhook

# Explicit agent ID (overrides LAYERCODE_AGENT_ID env var)

uvx layercode-gym tunnel --port 8000 --agent-id ag-123456 --unsafe-update-webhook

```

When using `--unsafe-update-webhook`:

1. The tunnel starts and gets a base URL (e.g., `https://random-words.trycloudflare.com`)

2. The agent path is appended to create the full webhook URL (e.g., `https://random-words.trycloudflare.com/api/agent`)

3. Your agent's webhook URL is automatically updated

4. When you stop the tunnel (Ctrl+C), the original webhook URL is restored

**Agent path resolution:** `--agent-path` flag → `LAYERCODE_AGENT_PATH` env var → path from existing webhook → default `/api/agent`

**Environment Variables:**

- `LAYERCODE_AGENT_ID` - Default agent ID for webhook updates

- `LAYERCODE_API_KEY` - API key for webhook updates (required for `--unsafe-update-webhook`)

- `LAYERCODE_AGENT_PATH` - Path to append to tunnel URL (default: extracted from existing webhook, or `/api/agent`)

> **Warning:** `--unsafe-update-webhook` modifies your agent's configuration. Only use with development/test agents, not production.

See [tunnel documentation](docs/tunnel.md) for more details.

### Python API

```bash

# Install

uv add layercode-gym

# Set environment

export SERVER_URL="http://localhost:8001"

export LAYERCODE_AGENT_ID="your_agent_id"

export OPENAI_API_KEY="sk-..."  # For TTS and AI personas

```

```python

from layercode_gym import LayercodeClient, UserSimulator

# Simple text messages

simulator = UserSimulator.from_text(

    messages=["Hello!", "Tell me about pricing", "Thank you"],

    send_as_text=True

)

client = LayercodeClient(simulator=simulator)

conversation_id = await client.run()

```

## Architecture

```

┌─────────────┐                    ┌──────────────┐

│  Your Test  │──1. Authorize──────▶│ Your Backend │

│    Code     │                     │   Server     │

└─────────────┘                     └──────────────┘

       │                                    │

       │                             2. Return

       │                           client_session_key

       │                                    │

       └──────3. Connect with key───────────┘

                      │

                      ▼

              ┌──────────────┐

              │  Layercode   │

              │   Platform   │

              └──────────────┘

```

**Flow:**

1. Client authorizes through YOUR backend server (`SERVER_URL`)

2. Backend returns `client_session_key` from LayerCode

3. Client connects to LayerCode WebSocket with that key

The client never hits LayerCode's API directly - it always goes through your backend first.

## User Simulators

Three types for different testing needs:

### 1. Fixed Text Messages

Fastest option, perfect for regression testing:

```python

simulator = UserSimulator.from_text(

    messages=["Hello", "Tell me more", "Goodbye"],

    send_as_text=True  # or False to use TTS

)

```

### 2. Pre-recorded Audio Files

Test transcription and audio handling:

```python

from pathlib import Path

simulator = UserSimulator.from_files(

    files=[Path("greeting.wav"), Path("question.wav")]

)

```

### 3. AI Agent Personas

Realistic, dynamic conversations using PydanticAI:

```python

from layercode_gym import Persona

simulator = UserSimulator.from_agent(

    persona=Persona(

        background_context="You are a 35-year-old small business owner",

        intent="You want to understand pricing and features"

    ),

    model="openai:gpt-5-mini",

    max_turns=5

)

```

## Examples

The `examples/` directory contains ready-to-run scripts:

- **01_text_messages.py** - Simple text conversation for quick testing

- **02_audio_file.py** - Stream pre-recorded audio to test transcription

- **03_agent_persona.py** - AI-driven user with dynamic responses

- **04_callbacks_judge.py** - CriteriaJudge for automated pass/fail evaluation

- **05_batch_evaluation.py** - Run multiple conversations concurrently

- **06_outdoor_shop_eval.py** - Custom data processor with domain-specific criteria

- **07_custom_judge.py** - Build your own judge with custom PydanticAI output types

- **08_long_running_task.py** - Testing agents with wait handling for slow operations

Run any example:

```bash

python examples/01_text_messages.py

```

See [full documentation](https://svilupp.github.io/layercode-gym/examples) for detailed explanations.

## LLM-as-Judge Evaluation

Evaluate conversations against pass/fail criteria using `CriteriaJudge`:

```python

from layercode_gym import CriteriaJudge, LayercodeClient, Settings

judge = CriteriaJudge(

    criteria=[

        "Did the agent answer all user questions?",

        "Was the agent polite and professional?",

        "Did the conversation flow naturally?"

    ],

    # Note: gpt-5-mini is fast/cheap for testing; use gpt-5 for production

    model="openai:gpt-5-mini"

)

async def on_end(log):

    result = await judge.evaluate(log)

    print(f"Overall: {'PASS' if result.overall_pass else 'FAIL'}")

    judge.save_results(result, log.conversation_id, Settings.load().output_root)

client = LayercodeClient(

    simulator=simulator,

    conversation_callback=on_end

)

```

Results saved to `conversations//judge_evaluation.json` with full evaluation metadata:

```json

{

  "schema_version": "1.0",

  "evaluated_at": "2025-12-05T13:15:41.124793+00:00",

  "model": "openai:gpt-5-mini",

  "criteria": [{"id": 1, "criterion": "Did the agent answer all user questions?"}],

  "additional_context": "Optional context provided to the judge",

  "judgment": {

    "criteria_results": [{"criterion_id": 1, "passed": true}],

    "overall_pass": true,

    "reasoning": "The agent answered all questions clearly..."

  },

  "results_summary": [{"id": 1, "criterion": "...", "passed": true}]

}

```

## Batch Testing

Run hundreds of conversations concurrently:

```python

import asyncio

from tqdm.asyncio import tqdm_asyncio

scenarios = ["Message 1", "Message 2", "Message 3"]

tasks = [run_conversation(msg) for msg in scenarios]

results = await tqdm_asyncio.gather(*tasks, desc="Running conversations")

```

See `examples/05_batch_evaluation.py` for the complete pattern.

## GitHub Actions CI/CD

Run automated tests in your CI pipeline with multiple personas in parallel:

```yaml

- uses: ./.github/actions/layercode-gym-test

  with:

    personas: |

      - background: You are a potential customer

        intent: Learn about pricing and features

      - background: You are a frustrated user

        intent: Get help with a problem

    judge-enabled: true

    judge-criteria: |

      - Did the agent provide clear and helpful responses?

    server-url: ${{ secrets.SERVER_URL }}

    layercode-agent-id: ${{ secrets.LAYERCODE_AGENT_ID }}

    openai-api-key: ${{ secrets.OPENAI_API_KEY }}

```

**Features:**

- Run multiple personas in parallel for maximum speed

- Automated quality evaluation with LLM judge

- Detailed artifacts with transcripts and audio recordings

- Optional LogFire observability integration

**Tip:** Use the `api-agents` CLI to update your agent's webhook URL for PR testing:

```bash

# Point agent to PR-specific backend before running tests

layercode-gym api-agents update --agent-id ag-123 --webhook-url https://pr-456.example.com/webhook

# Restore original after tests

layercode-gym api-agents update --agent-id ag-123 --webhook-url https://production.example.com/webhook

```

See [GitHub Actions documentation](docs/github-action.md) for complete setup guide, or [`api-agents` CLI docs](docs/api-agents.md) for webhook management.

## Conversation Outputs

After each conversation:

```

conversations//

├── transcript.json          # Full log with timing metrics

├── conversation_mix.wav     # Combined audio (user + assistant)

├── user_0.wav              # Individual user turns

├── assistant_0.wav         # Individual assistant turns

└── judge_evaluation.json   # CriteriaJudge results (if enabled)

```

Transcript includes TTFAB, latency stats, turn counts, and full message history.

## Custom Implementations

### Custom TTS Engine

```python

from layercode_gym.simulator import TTSEngineProtocol

from pathlib import Path

class MyTTSEngine(TTSEngineProtocol):

    async def synthesize(self, text: str, **kwargs) -> Path:

        # Your TTS service (ElevenLabs, Azure, etc.)

        return audio_file_path

simulator = UserSimulator.from_text(

    messages=["Hello!"],

    send_as_text=False,

    tts_engine=MyTTSEngine()

)

```

### Custom LLM for Agents

Use any LLM supported by PydanticAI. **Important:** You must define the system prompt with proper placeholders.

```python

from pydantic_ai import Agent

from textprompts import TextTemplates

# Load the required prompt template

templates = TextTemplates("src/layercode_gym/simulator/prompts")

system_prompt = templates.render(

    "basic_agent.txt",

    background_context="Your background",

    intent="Your intent"

)

# Create custom agent with proper system prompt

agent = Agent(

    "anthropic:claude-3-5-sonnet",

    system_prompt=system_prompt

)

simulator = UserSimulator.from_agent(agent=agent, deps=my_deps)

```

**Available models:**

- `openai:gpt-5` / `openai:gpt-5-mini`

- `anthropic:claude-3-5-sonnet`

- `ollama:llama3` (local)

- `gemini:gemini-1.5-pro`

**Prompt requirements:** The system prompt must include `{background_context}` and `{intent}` placeholders. See `src/layercode_gym/simulator/prompts/basic_agent.txt` for the default template.

### Custom Simulator

Full control via protocol implementation:

```python

from layercode_gym.simulator import UserSimulatorProtocol, UserRequest, UserResponse

class MyCustomSimulator(UserSimulatorProtocol):

    async def get_response(self, request: UserRequest) -> UserResponse | None:

        # Your logic here

        return UserResponse(text="Hello!", audio_path=None, data=())

```

## Environment Variables

**Required:**

```bash

SERVER_URL="http://localhost:8001"       # Your backend server

LAYERCODE_AGENT_ID="your_agent_id"       # LayerCode agent ID

```

**Optional:**

```bash

OPENAI_API_KEY="sk-..."                  # For TTS and AI agents

OPENAI_TTS_MODEL="gpt-4o-mini-tts"       # TTS model

OPENAI_TTS_VOICE="coral"                 # Voice (alloy, echo, fable, onyx, nova, shimmer, coral)

LAYERCODE_OUTPUT_ROOT="./conversations"  # Save location

LOGFIRE_TOKEN="..."                      # Enable LogFire observability

```

## LogFire Integration

Real-time observability and debugging with [LogFire](https://logfire.pydantic.dev/):

```bash

export LOGFIRE_TOKEN="your_token_here"

```

Automatically instruments PydanticAI and OpenAI calls, providing:

- Real-time conversation tracking

- Performance metrics and spans

- Error tracking with stack traces

- Beautiful UI for exploring conversations

## Type Safety

Enforces `mypy --strict` throughout. All event schemas use `TypedDict` or dataclasses.

```bash

uv run mypy src/layercode_gym

```

## Related Projects

- **[layercode-create-app](https://github.com/svilupp/layercode-create-app)** - CLI to scaffold LayerCode backends with tunneling

- **[layercode-examples](https://github.com/svilupp/layercode-examples)** - Agent patterns and integration recipes

## Documentation

Full documentation at [svilupp.github.io/layercode-gym](https://svilupp.github.io/layercode-gym)

- [Getting Started](https://svilupp.github.io/layercode-gym/getting-started)

- [Core Concepts](https://svilupp.github.io/layercode-gym/concepts)

- [Examples](https://svilupp.github.io/layercode-gym/examples)

- [API Reference](https://svilupp.github.io/layercode-gym/api-reference)

- [Advanced Usage](https://svilupp.github.io/layercode-gym/advanced)

## Contributing

This is a minimal, focused toolkit. Extensions should be done via:

- Custom simulator strategies (implement `UserSimulatorProtocol`)

- Custom callbacks (implement `TurnCallback` or `ConversationCallback`)

- Custom TTS engines (implement `TTSEngineProtocol`)

Keep the core simple and extensible.

## License

MIT - See [LICENSE](LICENSE) file for details.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/svilupp/layercode-gym

Awesome Lists containing this project

README