https://github.com/stimm-ai/stimm
The Open Source Voice Agent Platform. Orchestrate ultra-low latency AI pipelines for real-time conversations over WebRTC.
https://github.com/stimm-ai/stimm
conversational-ai docker livekit livekit-sdk livekit-server llm low-latency orchestration python realtime stt tts voice-agent voice-ai voice-assistant voicebot webrtc
Last synced: 4 months ago
JSON representation
The Open Source Voice Agent Platform. Orchestrate ultra-low latency AI pipelines for real-time conversations over WebRTC.
- Host: GitHub
- URL: https://github.com/stimm-ai/stimm
- Owner: stimm-ai
- License: agpl-3.0
- Created: 2025-12-04T08:34:52.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-12-15T16:18:46.000Z (7 months ago)
- Last Synced: 2025-12-18T09:37:15.253Z (7 months ago)
- Topics: conversational-ai, docker, livekit, livekit-sdk, livekit-server, llm, low-latency, orchestration, python, realtime, stt, tts, voice-agent, voice-ai, voice-assistant, voicebot, webrtc
- Language: Python
- Homepage:
- Size: 9.56 MB
- Stars: 28
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# stimm
Dual-agent voice orchestration built on [livekit-agents](https://github.com/livekit/agents):
one agent talks fast, one agent thinks deep, both collaborate in real-time.
## Why Stimm
- Low-latency conversational voice loop (`VAD → STT → fast LLM → TTS`).
- High-capability supervisor loop (tool use, planning, contextual steering).
- Typed protocol for Python + TypeScript supervisors.
- Runtime-safe provider contract + generated provider catalog from LiveKit docs.
- Integrator-friendly onboarding flow (discover providers first, install extras second).
## Install
```bash
# 1) Core package only (best first step for setup wizards)
pip install stimm
# 2) Install only the providers selected by the user
pip install stimm[deepgram,openai]
# Optional: install all runtime-supported providers
pip install stimm[all]
# TypeScript supervisor client
npm install @stimm/protocol
```
Plugin dependencies are installed in the integrator app environment. Stimm does
not vendor provider plugin code inside its wheel.
## Wizard-first Provider Flow
For onboarding UIs, use the catalog API to display providers/parameters, then
derive extras from the user selection:
```python
from stimm import extras_install_command, get_provider_catalog
catalog = get_provider_catalog() # exhaustive stt/tts/llm + parameters
cmd = extras_install_command(stt="deepgram", tts="openai", llm="azure-openai")
print(cmd) # pip install stimm[deepgram,openai]
```
After extras installation, restart the Python process before instantiating
LiveKit plugin classes.
Extension/wizard migration details are documented in
[docs/EXTENSION_WIZARD_INTEGRATION.md](docs/EXTENSION_WIZARD_INTEGRATION.md).
## Quick Start
### Voice Agent (Python)
```python
from stimm import VoiceAgent
from livekit.plugins import deepgram, openai, silero
agent = VoiceAgent(
stt=deepgram.STT(),
tts=openai.TTS(),
vad=silero.VAD.load(),
fast_llm=openai.LLM(model="gpt-4o-mini"),
buffering_level="MEDIUM",
mode="hybrid",
instructions="You are a helpful voice assistant.",
)
if __name__ == "__main__":
from livekit.agents import WorkerOptions, cli
cli.run_app(WorkerOptions(entrypoint_fnc=agent.entrypoint))
```
### Supervisor (Python)
```python
from stimm import Supervisor, TranscriptMessage
class MySupervisor(Supervisor):
async def on_transcript(self, msg: TranscriptMessage):
if not msg.partial:
result = await my_big_llm.process(msg.text)
await self.instruct(result.text, speak=True)
```
### Supervisor (TypeScript)
```typescript
import { StimmSupervisorClient } from "@stimm/protocol";
const client = new StimmSupervisorClient({
livekitUrl: "ws://localhost:7880",
token: supervisorToken,
});
client.on("transcript", async (msg) => {
if (!msg.partial) {
const result = await myAgent.process(msg.text);
await client.instruct({ text: result, speak: true, priority: "normal" });
}
});
await client.connect();
```
## Core Concepts
## Dual-Agent Architecture
Stimm is fundamentally built around two cooperating agents:
- `VoiceAgent`: optimized for low-latency spoken interaction.
- `Supervisor`: optimized for deeper reasoning, planning, and tool orchestration.
They exchange typed protocol messages over LiveKit data channels, allowing fast
turn-by-turn response while retaining high-level control and context.
| Component | Role |
|---|---|
| `VoiceAgent` | Handles live turn-by-turn speech interaction |
| `Supervisor` | Watches transcript and steers behavior asynchronously |
| `StimmProtocol` | Structured messages over LiveKit data channels |
Modes:
- `autonomous`: voice agent acts independently.
- `relay`: voice agent only speaks supervisor instructions.
- `hybrid` (default): autonomous with supervisor steering.
Pre-TTS buffering levels:
- `NONE`, `LOW`, `MEDIUM` (default), `HIGH`.
## Developer Workflow
```bash
# Install dev dependencies
pip install -e ".[dev]"
# Local infra
docker compose up -d
# Build local artifacts + sync providers + validate runtime contract
bash scripts/dev_build.sh
# Tests / lint
pytest
ruff check src/ tests/
# Catalog/contract checks (CI-equivalent)
python3 scripts/sync_livekit_plugins.py --check
python3 scripts/validate_runtime_contract.py --import-check
```
`scripts/dev_build.sh` is the single local command to rebuild protocol artifacts
and provider metadata from the LiveKit source of truth.
## Documentation
- Docusaurus docs site source: [website](website)
- Extension wizard integration: [website/docs/integrations/wizard.md](website/docs/integrations/wizard.md)
- Supervisor observability integration: [website/docs/integrations/supervisor-observability.md](website/docs/integrations/supervisor-observability.md)
## License
MIT