https://github.com/mrwong99/glyphoxa
AI-Powered Voice NPCs for Tabletop RPGs β a platform-agnostic, provider-independent voice AI framework written in Go
https://github.com/mrwong99/glyphoxa
ai discord-bot golang mcp npc speech-to-text tabletop-rpg text-to-speech ttrpg voice-ai
Last synced: 3 months ago
JSON representation
AI-Powered Voice NPCs for Tabletop RPGs β a platform-agnostic, provider-independent voice AI framework written in Go
- Host: GitHub
- URL: https://github.com/mrwong99/glyphoxa
- Owner: MrWong99
- License: mit
- Created: 2026-02-25T00:19:58.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-25T01:22:37.000Z (4 months ago)
- Last Synced: 2026-02-25T06:58:31.629Z (4 months ago)
- Topics: ai, discord-bot, golang, mcp, npc, speech-to-text, tabletop-rpg, text-to-speech, ttrpg, voice-ai
- Language: Go
- Size: 5.59 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# π Glyphoxa
[](https://github.com/MrWong99/glyphoxa/actions/workflows/ci.yml)
[](https://go.dev)
[](LICENSE)
[](https://goreportcard.com/report/github.com/MrWong99/glyphoxa)
**AI-Powered Voice NPCs for Tabletop RPGs** β a platform-agnostic, provider-independent voice AI framework that brings your NPCs to life.
---
## What is Glyphoxa?
Glyphoxa is a real-time voice AI framework that brings AI-driven talking personas into live voice chat sessions. Built for tabletop RPGs, it serves as a persistent AI co-pilot for the Dungeon Master β voicing NPCs with distinct personalities, transcribing sessions, and answering rules questions β without ever replacing the human storyteller.
Written in Go for native concurrency and sub-2-second mouth-to-ear latency.
> **β οΈ Early Alpha** β Glyphoxa is under active development. APIs may change between commits.
## β¨ Features
- π£οΈ **Voice NPC Personas** β AI-controlled NPCs with distinct voices, personalities, and backstories that speak in real-time
- π§ **Hybrid Memory System** β NPCs remember. Hot layer for instant context, cold layer for deep history, knowledge graph for world state
- π§ **MCP Tool Integration** β Plug-and-play tools (dice, rules lookup, image gen, web search) with performance-budgeted execution
- π **Provider-Agnostic** β Swap LLM, STT, TTS, or audio platform with a config change, not a rewrite
- β‘ **Sub-2s Latency** β End-to-end streaming pipeline with speculative pre-fetch and sentence-level TTS
- π **Multi-NPC Orchestration** β Multiple NPCs with address detection, turn-taking, and priority-based audio mixing
- π **Live Session Transcription** β Continuous STT with speaker identification for session logging and future lookup
- π§ͺ **Dual-Model Sentence Cascade** β Experimental: fast model opener + strong model continuation for perceived <600ms voice onset
- πΊοΈ **Entity Management** β Pre-session world-building with YAML campaign files and VTT imports (Foundry VTT, Roll20)
## ποΈ Architecture
```
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Audio Transport β
β (Discord / WebRTC / Custom) β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ€
β Audio In (VAD) β Audio Out (Mixer) β
ββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββ€
β Agent Orchestrator + Router β
β βββββββββββ βββββββββββ βββββββββββ β
β β NPC #1 β β NPC #2 β β NPC #3 β ... β
β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β
βββββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββββββββββββββ€
β Voice Engines β
β Cascaded (STTβLLMβTTS) β S2S (Gemini/OpenAI) β β Cascade β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Memory Subsystem β MCP Tool Execution β
β βββββββ βββββββ βββββββ β ββββββββ ββββββββ β
β β Log β β Vec β βGraphβ β β Dice β βRules β ... β
β βββββββ βββββββ βββββββ β ββββββββ ββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## π Quick Start
### Prerequisites
- **Go 1.26+** with CGo enabled
- **libopus** β `apt install libopus-dev` Β· `pacman -S opus` Β· `brew install opus`
- **ONNX Runtime** β from [onnxruntime releases](https://github.com/microsoft/onnxruntime/releases) (for Silero VAD)
### Build & Run
```bash
git clone https://github.com/MrWong99/glyphoxa.git
cd glyphoxa
# Build
make build
# Run
./bin/glyphoxa --config config.yaml
```
### Development
```bash
# Run tests with race detector (459 tests)
make test
# Full pre-commit check (fmt + vet + test)
make check
```
## π Provider Support
| Component | Providers |
|-----------|-----------|
| **STT** | Deepgram Nova-3, whisper.cpp (local) |
| **LLM** | OpenAI, Anthropic, Google Gemini, Ollama (local) β via [any-llm-go](https://github.com/mozilla-ai/any-llm-go) |
| **TTS** | ElevenLabs, Coqui XTTS (local) |
| **S2S** | Gemini Live, OpenAI Realtime |
| **Embeddings** | OpenAI, Ollama (local) |
| **Audio** | Discord, WebRTC |
| **Memory** | PostgreSQL + pgvector |
## β‘ Performance Targets
| Metric | Target | Hard Limit |
|--------|--------|------------|
| Mouth-to-ear latency | < 1.2s | 2.0s |
| STT time-to-first-token | < 300ms | 500ms |
| LLM time-to-first-token | < 400ms | 800ms |
| TTS time-to-first-byte | < 200ms | 500ms |
| Concurrent NPC voices | β₯ 3 | β₯ 1 |
| Hot memory assembly | < 50ms | < 150ms |
## π¦ Project Structure
```
glyphoxa/
βββ cmd/glyphoxa/ # Entry point
βββ internal/
β βββ agent/ # NPC agents, orchestrator, router, address detection
β βββ config/ # Configuration schema and loader
β βββ engine/ # Voice engines (S2S wrapper, sentence cascade)
β βββ entity/ # Entity management (CRUD, YAML, VTT import)
β βββ hotctx/ # Hot context assembly and formatting
β βββ mcp/ # MCP host, bridge, budget tiers, built-in tools
β βββ transcript/ # Transcript correction pipeline
βββ pkg/
β βββ audio/ # Platform + Connection interfaces, mixer, WebRTC
β βββ memory/ # Store interface, PostgreSQL + pgvector, knowledge graph
β βββ provider/ # LLM, STT, TTS, S2S, VAD, Embeddings interfaces + impls
βββ docs/design/ # Architecture and design documents
βββ research/ # Research notes
βββ configs/ # Example configuration files
```
## π Documentation
Comprehensive guides for developers and contributors β see the [full documentation index](docs/README.md).
| Guide | Description |
|-------|-------------|
| [Getting Started](docs/getting-started.md) | Prerequisites, build, first run |
| [Architecture](docs/architecture.md) | System layers, data flow, key packages |
| [Configuration](docs/configuration.md) | Complete config field reference |
| [Providers](docs/providers.md) | Provider system, adding new providers |
| [NPC Agents](docs/npc-agents.md) | NPC definition, entities, campaigns |
| [Memory](docs/memory.md) | 3-layer memory system |
| [MCP Tools](docs/mcp-tools.md) | Tool system, building custom tools |
| [Audio Pipeline](docs/audio-pipeline.md) | Audio flow, VAD, engine types |
| [Commands](docs/commands.md) | Discord slash and voice commands |
| [Deployment](docs/deployment.md) | Docker Compose, production setup |
| [Observability](docs/observability.md) | Metrics, Grafana, health endpoints |
| [Testing](docs/testing.md) | Test conventions and patterns |
| [Troubleshooting](docs/troubleshooting.md) | Common issues and debugging |
## π Design Documents
| Document | Description |
|----------|-------------|
| [Overview](docs/design/00-overview.md) | Vision, goals, product principles |
| [Architecture](docs/design/01-architecture.md) | System layers and data flow |
| [Providers](docs/design/02-providers.md) | LLM, STT, TTS, Audio platform interfaces |
| [Memory](docs/design/03-memory.md) | Hybrid memory system and knowledge graph |
| [MCP Tools](docs/design/04-mcp-tools.md) | Tool integration and performance budgets |
| [Sentence Cascade](docs/design/05-sentence-cascade.md) | β οΈ Dual-model cascade (experimental) |
| [NPC Agents](docs/design/06-npc-agents.md) | Agent design and multi-NPC orchestration |
| [Technology](docs/design/07-technology.md) | Technology decisions and latency budget |
| [Roadmap](docs/design/09-roadmap.md) | Development phases |
| [Knowledge Graph](docs/design/10-knowledge-graph.md) | L3 graph schema and query patterns |
## π€ Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, code style, and workflow guidelines.
- **Bugs** β [Bug Report](.github/ISSUE_TEMPLATE/bug_report.yml)
- **Features** β [Feature Request](.github/ISSUE_TEMPLATE/feature_request.yml)
- **Security** β [SECURITY.md](SECURITY.md)
## π License
[GPL v3](LICENSE) Β© Glyphoxa Contributors