https://github.com/crafter-station/trx
Agent-first CLI for audio/video transcription via Whisper
https://github.com/crafter-station/trx
agent audio captions cli speech-to-text srt subtitles transcription video whisper
Last synced: 2 months ago
JSON representation
Agent-first CLI for audio/video transcription via Whisper
- Host: GitHub
- URL: https://github.com/crafter-station/trx
- Owner: crafter-station
- License: mit
- Created: 2026-03-31T01:07:12.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-01T02:53:04.000Z (2 months ago)
- Last Synced: 2026-04-05T01:10:02.328Z (2 months ago)
- Topics: agent, audio, captions, cli, speech-to-text, srt, subtitles, transcription, video, whisper
- Language: TypeScript
- Homepage: https://trx.crafter.run
- Size: 358 KB
- Stars: 72
- Watchers: 0
- Forks: 13
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# @crafter/trx
Agent-first CLI for audio/video transcription via [Whisper](https://github.com/ggml-org/whisper.cpp).
Downloads, cleans, and transcribes media from URLs or local files with machine-readable output designed for AI agents.
## Install
```bash
bun add -g @crafter/trx
trx init
```
`trx init` installs dependencies (`whisper-cli`, `yt-dlp`, `ffmpeg` via Homebrew), downloads a Whisper model, and optionally installs the agent skill for your AI coding tool.
### Skill Only
If you already have trx set up and just want the agent skill:
```bash
npx skills add crafter-station/trx -g
```
## Usage
```bash
# Transcribe a local file
trx recording.mp4
# Transcribe from URL (YouTube, Twitter, Instagram, etc.)
trx "https://youtube.com/watch?v=..."
# Agent-friendly JSON output
trx transcribe video.mp4 --output json
# Only get the text (saves tokens)
trx transcribe video.mp4 --fields text --output json
# Dry-run (validate without executing)
trx transcribe video.mp4 --dry-run --output json
# Specify language
trx transcribe video.mp4 --language es
# Schema introspection for agents
trx schema transcribe
```
## Commands
| Command | Description |
|---------|-------------|
| `trx ` | Shorthand for `trx transcribe` |
| `trx init` | Install deps + download Whisper model |
| `trx transcribe ` | Full transcription pipeline |
| `trx doctor` | Check dependency status |
| `trx schema ` | JSON schema introspection |
## Agent-First Design
Built following [agent-first CLI principles](https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-agents/):
- **`--output json`** auto-detects: table for TTY, JSON when piped
- **`--dry-run`** validates before executing
- **`--fields`** limits response size to protect agent context windows
- **`trx schema`** runtime introspection (no docs needed)
- **Input validation** rejects control characters, path traversals, URL-encoded strings
- **Ships with SKILL.md** for Claude Code agent post-processing
## Agent Skill
The bundled skill (`skills/trx/SKILL.md`) enables AI agents to:
1. Transcribe media via CLI
2. Post-process output (fix punctuation, accents, technical terms, repeated phrases)
3. Reference `whisper-fixes.md` for common Whisper mistake patterns
## Pipeline
```
Input (URL or file)
|
v
[yt-dlp] Download media (if URL)
|
v
[ffmpeg] Clean audio (silence removal, noise reduction, normalization)
|
v
[whisper-cli] Transcribe (local Whisper model)
|
v
Output: .wav + .srt + .txt + JSON
```
## Configuration
Stored at `~/.trx/config.json` after `trx init`:
```json
{
"modelPath": "~/.trx/models/ggml-small.bin",
"modelSize": "small",
"language": "auto",
"threads": 8
}
```
Models: `tiny` (75MB) | `base` (142MB) | `small` (466MB) | `medium` (1.5GB) | `large` (3GB)
## License
MIT