https://github.com/rylinjames/litmus
Record and deterministically replay AI agent executions. Flight recorder for LLM agents. Fault injection, reliability scoring, CI gating.
https://github.com/rylinjames/litmus
ai-agents cli developer-tools fault-injection llm observability python reliability replay testing
Last synced: 2 months ago
JSON representation
Record and deterministically replay AI agent executions. Flight recorder for LLM agents. Fault injection, reliability scoring, CI gating.
- Host: GitHub
- URL: https://github.com/rylinjames/litmus
- Owner: rylinjames
- License: mit
- Created: 2026-03-25T06:35:27.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-26T07:07:31.000Z (3 months ago)
- Last Synced: 2026-03-27T02:32:25.224Z (3 months ago)
- Topics: ai-agents, cli, developer-tools, fault-injection, llm, observability, python, reliability, replay, testing
- Language: Python
- Homepage: https://pypi.org/project/litmus-trace/
- Size: 958 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-testing - Litmus - Record and replay AI agent LLM calls deterministically for testing and CI, with fault injection and reliability scoring. (Software / AI & LLM Testing)
README
# Litmus
**Record and deterministically replay AI agent executions.**
Litmus captures every LLM and tool call your agent makes, saving structured trace files you can inspect, share, and replay.
```bash
pip install litmus-trace
```
## Quick Start — Zero Code Changes
```bash
# Record your agent (wraps the process, captures all LLM calls)
litmus run python my_agent.py
# View the trace
litmus view ./traces/lt-abc123.trace.json
```
Your agent code stays completely unchanged. Litmus patches the SDK transport layer at runtime.
## What It Does
### Free (works offline, no account needed)
**Record** — Intercepts every HTTP call to LLM APIs (Anthropic, OpenAI, Mistral, 14+ providers). Saves the full request and response as a trace file. API keys are automatically redacted.
**View** — Pretty-print traces with step-by-step details, latency, and model info.
### Coming Soon (Litmus Cloud)
**Replay** — Feed recorded responses back to your agent. Same code path, same output, no real API calls.
**Fault Injection** — Mutate recorded responses to test resilience. What happens when Claude refuses? When GPT returns a 500? When the API times out?
**CI Gating** — Score your trace corpus for reliability and block deploys that drop below a threshold.
Join the [Discord](https://discord.gg/fA2SHvHb2D) to get notified when these features launch.
## Three Ways to Record
### 1. CLI Wrapper (recommended — zero code changes)
```bash
litmus run python my_agent.py
```
### 2. One-Line Python API
```python
import litmus
litmus.record()
# ... your existing agent code, unchanged ...
litmus.stop()
```
### 3. Proxy Mode (any language, advanced use)
```bash
litmus proxy --mode record
# Then point your SDK:
ANTHROPIC_BASE_URL=http://localhost:8787/anthropic python my_agent.py
```
## Supported Providers
Works with any LLM API out of the box:
| Provider | Status |
|----------|--------|
| Anthropic (Claude) | Tested |
| OpenAI (GPT) | Tested |
| Google (Gemini) | Supported |
| Mistral | Supported |
| Cohere | Supported |
| Groq | Supported |
| Together AI | Supported |
| Fireworks AI | Supported |
| DeepSeek | Supported |
| Perplexity | Supported |
| OpenRouter | Supported |
| Ollama (local) | Supported |
| vLLM (local) | Supported |
| LM Studio (local) | Supported |
**Custom/self-hosted models:**
```bash
litmus proxy --provider my-model=https://my-finetuned-llama.example.com/v1
```
## CLI Reference
```
litmus run Wrap a command to record (zero code changes)
litmus view Pretty-print a trace file
litmus proxy Start the recording proxy server
litmus providers List all supported providers
litmus replay Replay a trace (coming soon — requires Litmus Cloud)
litmus ci Score traces and gate deploys (coming soon — requires Litmus Cloud)
```
## How It Works
Litmus monkey-patches the `httpx` transport layer used by both Anthropic and OpenAI Python SDKs. When you call `client.messages.create(...)`, Litmus intercepts the HTTP request before it leaves your machine.
**Record mode:** The real API call goes through. Litmus captures the request and response, then saves them to a trace file. API keys are automatically redacted.
**Replay mode:** The real API is never called. Litmus serves the recorded response directly from the trace file. Your agent gets the exact same response it got during recording — same tool calls, same content, same stop reason.
## Security
- API keys (`Authorization`, `x-api-key`) are **automatically redacted** from trace headers
- Use `--compact` to strip request bodies for smaller trace files
- Note: message content in request/response bodies is NOT redacted — don't include secrets in your prompts
## Limitations
- **Python only** — the monkey-patch approach (`litmus run`, `litmus.record()`) requires Python. Use proxy mode for other languages.
- **httpx-based SDKs** — works with SDKs that use `httpx` under the hood (Anthropic, OpenAI, Mistral, Cohere, etc). SDKs using `requests` or `aiohttp` are not intercepted.
- **Sequential replay** — responses are served in recorded order. Agents that make calls in a different order on replay will get mismatched responses.
- **No tool call recording** — only LLM API calls are captured. External tool calls (database, HTTP APIs) are not recorded.
## Community
- [Discord](https://discord.gg/Nmr6tBx4xQ) — fastest way to get help, share traces, and request features
- [GitHub Issues](https://github.com/rylinjames/litmus/issues) — bug reports and feature requests
- [PyPI](https://pypi.org/project/litmus-trace/) — package
## Talk to Me
I'm building Litmus in the open and I want to hear from you — whether it's a bug, a feature idea, or just telling me about your agent setup. I personally respond to everything.
- **Email:** romirj@gmail.com
- **Discord:** romirj ([join the server](https://discord.gg/Nmr6tBx4xQ))
- **Twitter/X:** [@romir_jain](https://twitter.com/romir_jain)
If you're running agents in production and want to use Litmus, I'll personally help you set it up. DM me anywhere.
## Why Litmus?
**Observability tools** (LangSmith, Langfuse) tell you what happened. They log traces.
**Litmus captures the full picture.** Every LLM call, every response, every token — in a structured trace file you can inspect, share, and (soon) replay deterministically with fault injection.
LangSmith is the dashcam. Litmus is building the crash test facility.
## License
MIT