An open API service indexing awesome lists of open source software.

https://github.com/rylinjames/litmus

Record and deterministically replay AI agent executions. Flight recorder for LLM agents. Fault injection, reliability scoring, CI gating.
https://github.com/rylinjames/litmus

ai-agents cli developer-tools fault-injection llm observability python reliability replay testing

Last synced: 2 months ago
JSON representation

Record and deterministically replay AI agent executions. Flight recorder for LLM agents. Fault injection, reliability scoring, CI gating.

Awesome Lists containing this project

README

          

# Litmus

**Record and deterministically replay AI agent executions.**


Litmus demo — record, replay, fault inject

Litmus captures every LLM and tool call your agent makes, saving structured trace files you can inspect, share, and replay.

```bash
pip install litmus-trace
```

## Quick Start — Zero Code Changes

```bash
# Record your agent (wraps the process, captures all LLM calls)
litmus run python my_agent.py

# View the trace
litmus view ./traces/lt-abc123.trace.json
```

Your agent code stays completely unchanged. Litmus patches the SDK transport layer at runtime.

## What It Does

### Free (works offline, no account needed)

**Record** — Intercepts every HTTP call to LLM APIs (Anthropic, OpenAI, Mistral, 14+ providers). Saves the full request and response as a trace file. API keys are automatically redacted.

**View** — Pretty-print traces with step-by-step details, latency, and model info.

### Coming Soon (Litmus Cloud)

**Replay** — Feed recorded responses back to your agent. Same code path, same output, no real API calls.

**Fault Injection** — Mutate recorded responses to test resilience. What happens when Claude refuses? When GPT returns a 500? When the API times out?

**CI Gating** — Score your trace corpus for reliability and block deploys that drop below a threshold.

Join the [Discord](https://discord.gg/fA2SHvHb2D) to get notified when these features launch.

## Three Ways to Record

### 1. CLI Wrapper (recommended — zero code changes)

```bash
litmus run python my_agent.py
```

### 2. One-Line Python API

```python
import litmus

litmus.record()
# ... your existing agent code, unchanged ...
litmus.stop()
```

### 3. Proxy Mode (any language, advanced use)

```bash
litmus proxy --mode record
# Then point your SDK:
ANTHROPIC_BASE_URL=http://localhost:8787/anthropic python my_agent.py
```

## Supported Providers

Works with any LLM API out of the box:

| Provider | Status |
|----------|--------|
| Anthropic (Claude) | Tested |
| OpenAI (GPT) | Tested |
| Google (Gemini) | Supported |
| Mistral | Supported |
| Cohere | Supported |
| Groq | Supported |
| Together AI | Supported |
| Fireworks AI | Supported |
| DeepSeek | Supported |
| Perplexity | Supported |
| OpenRouter | Supported |
| Ollama (local) | Supported |
| vLLM (local) | Supported |
| LM Studio (local) | Supported |

**Custom/self-hosted models:**

```bash
litmus proxy --provider my-model=https://my-finetuned-llama.example.com/v1
```

## CLI Reference

```
litmus run Wrap a command to record (zero code changes)
litmus view Pretty-print a trace file
litmus proxy Start the recording proxy server
litmus providers List all supported providers
litmus replay Replay a trace (coming soon — requires Litmus Cloud)
litmus ci Score traces and gate deploys (coming soon — requires Litmus Cloud)
```

## How It Works

Litmus monkey-patches the `httpx` transport layer used by both Anthropic and OpenAI Python SDKs. When you call `client.messages.create(...)`, Litmus intercepts the HTTP request before it leaves your machine.

**Record mode:** The real API call goes through. Litmus captures the request and response, then saves them to a trace file. API keys are automatically redacted.

**Replay mode:** The real API is never called. Litmus serves the recorded response directly from the trace file. Your agent gets the exact same response it got during recording — same tool calls, same content, same stop reason.

## Security

- API keys (`Authorization`, `x-api-key`) are **automatically redacted** from trace headers
- Use `--compact` to strip request bodies for smaller trace files
- Note: message content in request/response bodies is NOT redacted — don't include secrets in your prompts

## Limitations

- **Python only** — the monkey-patch approach (`litmus run`, `litmus.record()`) requires Python. Use proxy mode for other languages.
- **httpx-based SDKs** — works with SDKs that use `httpx` under the hood (Anthropic, OpenAI, Mistral, Cohere, etc). SDKs using `requests` or `aiohttp` are not intercepted.
- **Sequential replay** — responses are served in recorded order. Agents that make calls in a different order on replay will get mismatched responses.
- **No tool call recording** — only LLM API calls are captured. External tool calls (database, HTTP APIs) are not recorded.

## Community

- [Discord](https://discord.gg/Nmr6tBx4xQ) — fastest way to get help, share traces, and request features
- [GitHub Issues](https://github.com/rylinjames/litmus/issues) — bug reports and feature requests
- [PyPI](https://pypi.org/project/litmus-trace/) — package

## Talk to Me

I'm building Litmus in the open and I want to hear from you — whether it's a bug, a feature idea, or just telling me about your agent setup. I personally respond to everything.

- **Email:** romirj@gmail.com
- **Discord:** romirj ([join the server](https://discord.gg/Nmr6tBx4xQ))
- **Twitter/X:** [@romir_jain](https://twitter.com/romir_jain)

If you're running agents in production and want to use Litmus, I'll personally help you set it up. DM me anywhere.

## Why Litmus?

**Observability tools** (LangSmith, Langfuse) tell you what happened. They log traces.

**Litmus captures the full picture.** Every LLM call, every response, every token — in a structured trace file you can inspect, share, and (soon) replay deterministically with fault injection.

LangSmith is the dashcam. Litmus is building the crash test facility.

## License

MIT