{"id":35121496,"url":"https://github.com/lukecarr/litmus","last_synced_at":"2026-01-13T22:01:49.604Z","repository":{"id":330746315,"uuid":"1123764848","full_name":"lukecarr/litmus","owner":"lukecarr","description":"Specification testing for structured LLM responses. ","archived":false,"fork":false,"pushed_at":"2026-01-10T21:07:01.000Z","size":42,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-11T05:53:46.595Z","etag":null,"topics":["llm-comparison","llm-testing","openrouter","specification-test"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lukecarr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["lukecarr"]}},"created_at":"2025-12-27T15:16:45.000Z","updated_at":"2026-01-10T20:31:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lukecarr/litmus","commit_stats":null,"previous_names":["lukecarr/litmus"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/lukecarr/litmus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukecarr%2Flitmus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukecarr%2Flitmus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukecarr%2Flitmus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukecarr%2Flitmus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lukecarr","download_url":"https://codeload.github.com/lukecarr/litmus/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukecarr%2Flitmus/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28400411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-13T14:36:09.778Z","status":"ssl_error","status_checked_at":"2026-01-13T14:35:19.697Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm-comparison","llm-testing","openrouter","specification-test"],"created_at":"2025-12-28T00:04:34.843Z","updated_at":"2026-01-13T22:01:49.598Z","avatar_url":"https://github.com/lukecarr.png","language":"Go","funding_links":["https://github.com/sponsors/lukecarr"],"categories":[],"sub_categories":[],"readme":"# Litmus\n\nSpecification testing for structured LLM outputs.\n\nLitmus lets you define test cases with input strings and expected JSON outputs, run them against LLM models via OpenRouter, and compare accuracy, latency, and throughput across models.\n\n## Example output\n\n```plain\n$ litmus run --tests example/tests.json --schema example/schema.json --prompt-file example/prompt.txt --model openai/gpt-4.1-nano --model mistralai/mistral-nemo                 \nRunning 2 tests against openai/gpt-4.1-nano...\nRunning 2 tests against mistralai/mistral-nemo...\n\nLitmus Test Report\n──────────────────────────────────────────────────\nTimestamp: 2025-12-27T16:19:30Z\nTest File: example/tests.json\nSchema:    example/schema.json\n\nModel: openai/gpt-4.1-nano\n──────────────────────────────────────────────────\nProvider: OpenAI\nResults:  2 passed / 0 failed (100.0% accuracy)\nTokens:   148 in / 34 out\nLatency:  P50=363ms  P95=454ms  P99=462ms\nDuration: 2.11s (16.1 tok/s)\n\n┌────────────────────────┬────────┬─────────┬────────┐\n│          TEST          │ STATUS │ LATENCY │ TOKENS │\n├────────────────────────┼────────┼─────────┼────────┤\n│ Extract person info    │ ✓ PASS │ 263ms   │ 74/17  │\n│ Extract another person │ ✓ PASS │ 464ms   │ 74/17  │\n└────────────────────────┴────────┴─────────┴────────┘\n\nModel: mistralai/mistral-nemo\n──────────────────────────────────────────────────\nProvider: Mistral\nResults:  2 passed / 0 failed (100.0% accuracy)\nTokens:   64 in / 56 out\nLatency:  P50=254ms  P95=262ms  P99=263ms\nDuration: 763ms (73.4 tok/s)\n\n┌────────────────────────┬────────┬─────────┬────────┐\n│          TEST          │ STATUS │ LATENCY │ TOKENS │\n├────────────────────────┼────────┼─────────┼────────┤\n│ Extract person info    │ ✓ PASS │ 246ms   │ 32/28  │\n│ Extract another person │ ✓ PASS │ 263ms   │ 32/28  │\n└────────────────────────┴────────┴─────────┴────────┘\n\nModel Comparison\n──────────────────────────────────────────────────\n┌────────────────────────┬──────────┬──────────┬──────────────┬─────────┬────────┐\n│         MODEL          │ PROVIDER │ ACCURACY │ P 50 LATENCY │ TOK / S │ TOKENS │\n├────────────────────────┼──────────┼──────────┼──────────────┼─────────┼────────┤\n│ openai/gpt-4.1-nano    │ OpenAI   │ 100.0%   │ 363ms        │ 16.1    │ 182    │\n│ mistralai/mistral-nemo │ Mistral  │ 100.0%   │ 254ms        │ 73.4    │ 120    │\n└────────────────────────┴──────────┴──────────┴──────────────┴─────────┴────────┘\n```\n\n## Installation\n\nDownload a pre-built binary from the [latest release](https://github.com/lukecarr/litmus/releases/latest), or install with Go:\n\n```bash\ngo install go.carr.sh/litmus@latest\n```\n\nOr compile from source:\n\n```bash\ngit clone https://github.com/lukecarr/litmus.git\ncd litmus\ngo build -o litmus .\n```\n\n## Quick Start\n\n1. Set your OpenRouter API key:\n\n```bash\nexport OPENROUTER_API_KEY=\"your-api-key\"\n```\n\n2. Create a test file (`tests.json`):\n\n```json\n[\n  {\n    \"name\": \"Extract person info\",\n    \"input\": \"John Smith is 30 years old and works at Acme Corp\",\n    \"expected\": {\n      \"name\": \"John Smith\",\n      \"age\": 30,\n      \"company\": \"Acme Corp\"\n    }\n  },\n  {\n    \"name\": \"Extract another person\",\n    \"input\": \"Jane Doe, age 25, is employed by TechStart Inc\",\n    \"expected\": {\n      \"name\": \"Jane Doe\",\n      \"age\": 25,\n      \"company\": \"TechStart Inc\"\n    }\n  }\n]\n```\n\n3. Create a JSON schema (`schema.json`):\n\n```json\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"name\": { \"type\": \"string\" },\n    \"age\": { \"type\": \"integer\" },\n    \"company\": { \"type\": \"string\" }\n  },\n  \"required\": [\"name\", \"age\", \"company\"],\n  \"additionalProperties\": false\n}\n```\n\n4. Create a prompt file (`prompt.txt`):\n\n```plain\nExtract the person's name, age, and company from the given text.\n```\n\n5. Run tests:\n\n```bash\nlitmus run --tests tests.json --schema schema.json --prompt-file prompt.txt --model openai/gpt-4.1-nano\n```\n\n## Usage\n\n### Basic Command\n\n```bash\nlitmus run --tests \u003ctest-file\u003e --schema \u003cschema-file\u003e --prompt \u003cprompt\u003e --model \u003cmodel\u003e\n```\n\n### Flags\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `--tests` | `-t` | Path to test cases JSON file (required) |\n| `--schema` | `-s` | Path to JSON schema file (required) |\n| `--prompt` | `-p` | System prompt for the LLM |\n| `--prompt-file` | | Path to file containing system prompt |\n| `--model` | `-m` | Model to test against (required, can be repeated) |\n| `--parallel` | `-P` | Number of parallel requests per model (default: 1) |\n| `--output` | `-o` | Output format: `terminal`, `json`, or `html` (default: `terminal`) |\n| `--api-key` | | OpenRouter API key (or use OPENROUTER_API_KEY env var) |\n\n### Examples\n\n**Single model:**\n\n```bash\nlitmus run \\\n  --tests tests.json \\\n  --schema schema.json \\\n  --prompt-file prompt.txt \\\n  --model openai/gpt-4.1-nano\n```\n\n**Multiple models for comparison:**\n\n```bash\nlitmus run \\\n  --tests tests.json \\\n  --schema schema.json \\\n  --prompt \"Extract entities from the text\" \\\n  --model openai/gpt-4.1-nano \\\n  --model mistralai/mistral-nemo\n```\n\n**Parallel execution:**\n\n```bash\nlitmus run \\\n  --tests tests.json \\\n  --schema schema.json \\\n  --prompt-file prompt.txt \\\n  --model openai/gpt-4.1-nano \\\n  --parallel 5\n```\n\n**JSON output for CI/CD:**\n\n```bash\nlitmus run \\\n  --tests tests.json \\\n  --schema schema.json \\\n  --prompt-file prompt.txt \\\n  --model openai/gpt-4.1-nano \\\n  --output json \u003e results.json\n```\n\n**HTML report:**\n\n```bash\nlitmus run \\\n  --tests tests.json \\\n  --schema schema.json \\\n  --prompt-file prompt.txt \\\n  --model openai/gpt-4.1-nano \\\n  --output html \u003e report.html\n```\n\n## Test File Format\n\nThe test file is a JSON array of test cases:\n\n```json\n[\n  {\n    \"name\": \"Test name (for display)\",\n    \"input\": \"The input text to send to the LLM\",\n    \"expected\": {\n      \"field1\": \"expected value\",\n      \"field2\": 123\n    }\n  }\n]\n```\n\n- `name`: A human-readable name for the test case\n- `input`: The user message sent to the LLM\n- `expected`: The expected JSON output (must match the schema)\n\n## JSON Schema\n\nThe schema file should be a valid [JSON Schema](https://json-schema.org/). It is passed to OpenRouter's `response_format` parameter to enforce structured output from the LLM.\n\nExample schema:\n\n```json\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"sentiment\": {\n      \"type\": \"string\",\n      \"enum\": [\"positive\", \"negative\", \"neutral\"]\n    },\n    \"confidence\": {\n      \"type\": \"number\",\n      \"minimum\": 0,\n      \"maximum\": 1\n    }\n  },\n  \"required\": [\"sentiment\", \"confidence\"],\n  \"additionalProperties\": false\n}\n```\n\n## Output\n\nLitmus supports three output formats via the `--output` flag:\n\n- `terminal` (default): Colored, formatted output for the terminal\n- `json`: Machine-readable JSON for CI/CD pipelines\n- `html`: Self-contained HTML report for sharing and archiving\n\n### Terminal Output\n\nThe terminal output includes:\n\n- Provider used for each model\n- Summary metrics (pass/fail counts, accuracy %)\n- Token usage and throughput (tokens/second)\n- Latency percentiles (P50, P95, P99)\n- Detailed test results table\n- Field-level diff for failures\n- Model comparison table (when testing multiple models)\n\n### JSON Output\n\nUse `--output json` to get machine-readable output:\n\n```json\n{\n  \"timestamp\": \"2025-12-27T16:19:30Z\",\n  \"prompt\": \"Extract entities...\",\n  \"schema_file\": \"schema.json\",\n  \"test_file\": \"tests.json\",\n  \"models\": [\n    {\n      \"model\": \"openai/gpt-4.1-nano\",\n      \"results\": [...],\n      \"metrics\": {\n        \"total_tests\": 10,\n        \"passed\": 9,\n        \"failed\": 1,\n        \"accuracy\": 90.0,\n        \"latency_p50_ms\": 450,\n        \"throughput_tps\": 25.5\n      }\n    }\n  ]\n}\n```\n\n### HTML Output\n\nUse `--output html` to generate a self-contained HTML report:\n\n```bash\nlitmus run \\\n  --tests tests.json \\\n  --schema schema.json \\\n  --prompt-file prompt.txt \\\n  --model openai/gpt-4.1-nano \\\n  --output html \u003e report.html\n```\n\nThe HTML report includes all the same information as the terminal output, formatted for viewing in a browser. It's self-contained with no external dependencies, making it easy to share or archive.\n\n![HTML Report Screenshot](https://github.com/user-attachments/assets/0f2ba956-de27-42fa-9e06-42bda13412b0)\n\n## Exit Codes\n\n- `0`: All tests passed\n- `1`: One or more tests failed or errored\n\n## Supported Models\n\nLitmus works with any model available on [OpenRouter](https://openrouter.ai/models).\n\n## License\n\nLitmus is licensed under the [MIT License](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flukecarr%2Flitmus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flukecarr%2Flitmus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flukecarr%2Flitmus/lists"}