{"id":47898103,"url":"https://github.com/ivkond/litmus","last_synced_at":"2026-04-04T03:56:22.636Z","repository":{"id":346907913,"uuid":"1190934613","full_name":"ivkond/litmus","owner":"ivkond","description":"LLM coding agent benchmark TUI — run scenarios across Claude Code, Codex, Aider, KiloCode and more, then compare results","archived":false,"fork":false,"pushed_at":"2026-03-31T12:28:24.000Z","size":159,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-04T03:56:20.833Z","etag":null,"topics":["agents","ai","aider","benchmark","claude","code-generation","codex","coding-agents","evaluation","llm","python","testing","textual","tui"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ivkond.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-03-24T19:02:45.000Z","updated_at":"2026-03-25T23:06:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ivkond/litmus","commit_stats":null,"previous_names":["ivkond/litmus"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/ivkond/litmus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivkond%2Flitmus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivkond%2Flitmus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivkond%2Flitmus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivkond%2Flitmus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ivkond","download_url":"https://codeload.github.com/ivkond/litmus/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivkond%2Flitmus/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31387024,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T01:22:39.193Z","status":"online","status_checked_at":"2026-04-04T02:00:07.569Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","aider","benchmark","claude","code-generation","codex","coding-agents","evaluation","llm","python","testing","textual","tui"],"created_at":"2026-04-04T03:56:21.845Z","updated_at":"2026-04-04T03:56:22.621Z","avatar_url":"https://github.com/ivkond.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Litmus 🧪\n\n[![CI](https://github.com/ivkond/litmus/actions/workflows/ci.yml/badge.svg)](https://github.com/ivkond/litmus/actions/workflows/ci.yml)\n[![Security (Bandit)](https://github.com/ivkond/litmus/actions/workflows/bandit.yml/badge.svg)](https://github.com/ivkond/litmus/actions/workflows/bandit.yml)\n[![Security (OSV)](https://github.com/ivkond/litmus/actions/workflows/osv-scanner.yml/badge.svg)](https://github.com/ivkond/litmus/actions/workflows/osv-scanner.yml)\n[![PyPI](https://img.shields.io/pypi/v/litmus-llm)](https://pypi.org/project/litmus-llm/)\n[![Python](https://img.shields.io/pypi/pyversions/litmus-llm)](https://pypi.org/project/litmus-llm/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n\n**Terminal UI for running LLM agent scenarios and comparing their performance.**\n\nLitmus executes coding tasks across multiple AI agents and models, runs tests against the results, and produces detailed evaluation reports — all from a single TUI.\n\n## What it does\n\n1. **Detects agents** installed on your system (Claude Code, Codex, Aider, Cursor Agent, KiloCode, OpenCode)\n2. **Runs scenarios** — each scenario is a coding task with tests and scoring criteria\n3. **Evaluates results** — an LLM judge scores agent and model performance across 20 criteria each\n4. **Generates reports** — HTML reports with per-scenario breakdowns, logs, and scores\n\n## Supported agents\n\n| Agent | Binary | Model listing |\n|-------|--------|---------------|\n| Claude Code | `claude` | Built-in list |\n| Codex | `codex` | Built-in list |\n| OpenCode | `opencode` | `opencode models` |\n| KiloCode | `kilocode` | `kilocode models` |\n| Aider | `aider` | `aider --list-models` |\n| Cursor Agent | `agent` | `agent models` |\n\nLitmus auto-detects which agents are available and queries their model lists.\n\n## Quick start\n\nRequires **Python 3.12+**.\n\n```bash\npip install litmus-llm\nlitmus init      # create a workspace with a sample scenario\nlitmus           # open the TUI\n```\n\nOr run without installing via [uv](https://docs.astral.sh/uv/):\n\n```bash\nuvx --from litmus-llm litmus\n```\n\n### Development setup\n\n```bash\ngit clone https://github.com/ivkond/litmus.git\ncd litmus\nuv sync\nuv run litmus\n```\n\n### TUI workflow\n\n1. 📋 **Models** — select agents and models to test\n2. 🧩 **Scenarios** — pick which coding tasks to run\n3. ▶️ **Run** — watch execution progress in real time\n4. 📊 **Analysis** — review LLM-judged scores\n5. 📄 **Reports** — browse generated HTML reports\n\n## How it works\n\nEach scenario lives in `template/\u003cid\u003e/` and contains:\n\n```\ntemplate/1-data-structure/\n  prompt.txt        # Task description sent to the agent\n  task.txt          # Detailed requirements\n  scoring.csv       # Evaluation criteria\n  project/          # Starter code with tests\n```\n\nExecution pipeline per scenario:\n\n```\nuv sync  -\u003e  agent call  -\u003e  pytest  -\u003e  collect logs\n```\n\nAfter all runs complete, an LLM judge evaluates the results using 20 agent criteria (tool efficiency, error recovery, reasoning depth...) and 20 model criteria (code correctness, instruction following, hallucination resistance...).\n\n## Configuration\n\nOn first launch, Litmus generates a config file with detected agents and their settings. Configure the analysis model (any OpenAI-compatible API) through the TUI settings screen.\n\n## Scenario packs\n\nLitmus supports exporting and importing scenario archives (`.litmus-pack` ZIP files) for sharing test suites between machines or teams.\n\n## Project structure\n\n```\nsrc/litmus/\n  __init__.py       # Entry point, workspace init\n  app.py            # Main app, menu screen\n  agents.py         # Agent registry, detection, model listing\n  run.py            # Scenario execution engine\n  analysis.py       # LLM-powered evaluation (20+20 criteria)\n  report.py         # HTML report generation\n  pack/             # Scenario export/import\n  screens/          # TUI screens (models, scenarios, run, results, analysis)\n```\n\n## Tech stack\n\n- [Textual](https://textual.textualize.io/) — TUI framework\n- [Rich](https://rich.readthedocs.io/) — terminal formatting\n- [Pydantic](https://docs.pydantic.dev/) — structured evaluation models\n- [OpenAI SDK](https://github.com/openai/openai-python) — LLM judge (any compatible API)\n\n## License\n\n[MIT](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fivkond%2Flitmus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fivkond%2Flitmus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fivkond%2Flitmus/lists"}