{"id":50392707,"url":"https://github.com/globulus/checkirai","last_synced_at":"2026-05-30T19:01:10.980Z","repository":{"id":355488446,"uuid":"1227249527","full_name":"globulus/checkirai","owner":"globulus","description":"Checkir AI is a spec-driven verification runtime using local LLMs","archived":false,"fork":false,"pushed_at":"2026-05-12T11:11:18.000Z","size":1367,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-12T13:11:58.822Z","etag":null,"topics":["ai-testing","cli","llm","local-ai","mcp","ollama","testing","web-testing"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/globulus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-02T12:22:33.000Z","updated_at":"2026-05-12T11:11:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/globulus/checkirai","commit_stats":null,"previous_names":["globulus/checkirai"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/globulus/checkirai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/globulus%2Fcheckirai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/globulus%2Fcheckirai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/globulus%2Fcheckirai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/globulus%2Fcheckirai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/globulus","download_url":"https://codeload.github.com/globulus/checkirai/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/globulus%2Fcheckirai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33705207,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-testing","cli","llm","local-ai","mcp","ollama","testing","web-testing"],"created_at":"2026-05-30T19:01:10.179Z","updated_at":"2026-05-30T19:01:10.966Z","avatar_url":"https://github.com/globulus.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"```text\n  / __|  | || |   | __|   / __|  | |/ /   |_ _|   | _ \\     o O O  /   \\   |_ _|\n | (__   | __ |   | _|   | (__   | ' \u003c     | |    |   /    o       | - |    | |\n  \\___|  |_||_|   |___|   \\___|  |_|\\_\\   |___|   |_|_\\   TS__[O]  |_|_|   |___|\n_|\"\"\"\"\"|_|\"\"\"\"\"|_|\"\"\"\"\"|_|\"\"\"\"\"|_|\"\"\"\"\"|_|\"\"\"\"\"|_|\"\"\"\"\"| {======|_|\"\"\"\"\"|_|\"\"\"\"\"|\n\"`-0-0-'\"`-0-0-'\"`-0-0-'\"`-0-0-'\"`-0-0-'\"`-0-0-'\"`-0-0-'./o--000'\"`-0-0-'\"`-0-0-'\n```\n\n\u003e **Local LLMs + MCP-backed tools = test your builds locally without paying token costs!**\n\u003e Parse specs, plan probes, collect evidence and return requirement-level verdicts.\n\n[![Build](https://github.com/globulus/checkirai/actions/workflows/node.js.yml/badge.svg?branch=main)](https://github.com/globulus/checkirai/actions/workflows/node.js.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**Checkir AI** is a spec-driven verification runtime: it reads a human-readable spec, plans probes, runs tools (including MCP-backed capabilities) and returns **requirement-level verdicts** — pass, fail, inconclusive, or blocked — with evidence you can inspect offline. LLM-assisted phases default to **Ollama** on your machine; you can also use a **`remote`** provider (OpenAI-compatible HTTP API) for normalization and judging. Tool hosts connect over **MCP** the same way Cursor or Claude Code talks to other servers.\n\n![GIF demo](./fixtures/demo.gif)\n\n---\n\n## Why?\n\n- **Tokens and API calls add up.** Re-running the same “does the UI match the spec?” loop through a cloud model is slow and expensive.\n- **Repeatable runs** write structured reports, SQLite state and artifacts under a known output root — ideal for CI, dashboards and agent loops.. You can also **repeat a run from any phase**, further saving time and costs.\n- **Local-first verification** keeps sensitive URLs, traces, and artifacts on your machine while still using an LLM where judgment helps (planning, interpretation). Optional **remote** LLMs use your API key and base URL (see `docs/USAGE.md` and `checkirai.config.json`) - but the main idea is still to be able to run this locally.\n- **CLI and MCP as the integration surface** lets Cursor, Claude Code and other hosts treat verification as a first-class tool alongside Chrome DevTools, filesystem, etc.\n\nNote that doing the same task (spec-based software verification using tools) with a remote agent is almost certainly going to be faster, but **each and every run incurrs a cost local checks don't**. Plus, unlike with a remote agent, you only need to do the spec -\u003e IR -\u003e executive plan pipeline once: the local artifacts allow you to re-run the judgement/verification phase solely, further conserving resources.\n\n---\n\n## Use cases\n\n- **Agent implement → verify → fix:** After a coding agent changes an app, call `verify` (or the MCP `verify_spec` tool) and feed failures back into the next edit.\n- **Human acceptance checks:** Maintain a markdown spec next to the repo; run verification before merge or release.\n- **Exploratory “what would we test?”:** Use `suggest_probe_plan` over MCP to plan probes without executing a full run.\n- **Local model hygiene:** Use `ollama status`, `model list`, `model suggest`, and `model pull` so the right instruct/tool-capable model is available before a run.\n- **Chrome DevTools MCP wiring:** Use `chrome-devtools list-tools` / `self-check` to confirm your MCP server exposes the expected tool surface (see `checkirai.config.json` for project defaults).\n- **Dart/Flutter MCP wiring:** Use `dart-mcp list-tools` / `self-check` and the `fixtures/flutter_app` + `fixtures/flutter-spec.md` showcase for `run_tests` and driver-style verification.\n\n---\n\n## Requirements\n\n- **Node.js** 22 or newer (`engines` in `package.json`)\n- **pnpm** (recommended; scripts assume it)\n- **Ollama** (optional but default for LLM-assisted phases) — install separately and start the daemon\n\n---\n\n## Installation\n\n### 1. Clone and install dependencies\n\n```bash\ngit clone https://github.com/globulus/checkirai\ncd checkirai\npnpm install\n```\n\n`postinstall` runs a TypeScript build so the `checkirai` bin can load `dist/`.\n\n### 2. Make the CLI available globally (pick one)\n\nFrom the repo root:\n\n```bash\npnpm link --global\n```\n\nOr install this package globally:\n\n```bash\npnpm add -g .\n```\n\nConfirm the binary is on your `PATH` (pnpm’s global bin):\n\n```bash\npnpm bin -g\ncheckirai --help\n```\n\n**Note:** The published entrypoint is `bin/checkirai.js` and delegates to `dist/`. If you see a build error, run `pnpm build` manually.\n\n### 3. Optional project config\n\nCopy or edit `checkirai.config.json` (or `.checkirai/config.json`) in your project root for:\n\n- **`defaults`** — `targetUrl`, `tools`, `outRoot`, optional **`profile`** (selects a key from **`profiles`** for LLM overrides), plus runtime tuning: `maxRunMs`, `runCommandAllowlist` (prefix with `*` or full command line; **empty means no `run_command` runs**), **`allowShellMetacharacters`** (opt-in to shell metacharacters in allowlisted commands), `stepRetries`, `stepRetryDelayMs`, `isolateProbeSessions` (one session per probe), `artifactMaxRuns` (prune old per-run artifact folders).\n- **`llm`** — Shared: **`ollamaHost`**, **`allowAutoPull`**, **`requireToolCapable`**. Per role **`normalizer`**, **`plannerAssist`**, **`judge`**, **`triage`**: each has **`provider`** (`ollama` \\| `remote` \\| `none`), **`model`**, optional **`fallbackModel`**, **`temperature`**, **`maxRetries`**, **`timeoutMs`**, and when **`remote`**: **`remoteBaseUrl`**, **`remoteApiKey`**. There is no single global `ollamaModel: \"auto\"`; each role names an explicit model tag (defaults ship in `src/llm/types.ts` and the sample config).\n- **`profiles`** — Optional map (e.g. `laptop_16gb`) of partial per-role overrides merged on top of **`llm`** when **`defaults.profile`** or **`CHECKIRAI_PROFILE`** is set.\n- **`mcpServers`** — e.g. `chrome-devtools` or `dart-mcp` with `command` / `args` so **`checkirai verify`** can spawn the matching MCP server when `--tools` includes that integration token.\n\n---\n\n## Web dashboard\n\nThe repo ships a **local web UI** plus a small API so you can kick off runs, watch progress, and browse results without living only in the terminal. **`checkirai.config.json`** **`defaults`** (timeouts, `runCommandAllowlist`, retries, isolation, artifact pruning, `allowShellMetacharacters`) are merged server-side when the request omits them. The **LLM** tab edits the full **`LlmPolicy`** (per-role providers, models, fallbacks, temperatures, Ollama host, auto-pull, require-tool-capable) and sends that object on **`verify_spec`**; the API still **`mergeLlmPolicyWithProjectProfile`** with the file, so **`profiles`** / **`defaults.profile`** apply on top. The **General** tab lists verifier **capability** ids (what probes may use) alongside the comma-separated **`tools`** tokens (what integrations are enabled). The **`model_catalog`** API (and **Model catalog** in the UI) includes a **`hardware`** block: total system RAM from the **API host** (`os.totalmem()`), a suggested **`profiles.*`** key (`laptop_16gb` / `workstation_24gb` / `high_end_40gb` per the LLM implementation plan), a RAM-filtered recommended model list, and—when that profile exists in the project file—a merged **`previewLlmPolicy`** you can apply in the dashboard.\n\n| Mode                        | Command          | Notes                                                                        |\n| --------------------------- | ---------------- | ---------------------------------------------------------------------------- |\n| Development (API + Vite UI) | `pnpm web:dev`   | UI: `http://127.0.0.1:5173` · API health: `http://127.0.0.1:8787/api/health` |\n| Production build            | `pnpm web:build` | Builds TypeScript + Vite static assets                                       |\n| Production serve            | `pnpm web:start` | Serves built UI + API (`SERVE_STATIC_FROM=web/dist`)                         |\n\nFor day-to-day work, `pnpm web:dev` is the usual choice.\n\n---\n\n## CLI commands\n\nTop-level program: **`checkirai`** (aliases in `package.json`: `spec-driven-verifier`, `verify-app` → same binary).\n\n### `checkirai verify`\n\nVerify a target URL against a markdown spec (or restart from a previous run).\n\n| Option                                       | Description                                                                                                                                                                                                                                                        |\n| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |\n| `--spec \u003cpath\u003e`                              | Path to spec markdown (required unless restarting from `spec_ir` / `llm_plan` with `--restart-run`)                                                                                                                                                                |\n| `--target \u003curl\u003e`                             | Base URL of the app under test (**required**)                                                                                                                                                                                                                      |\n| `--tools \u003clist\u003e`                             | Comma-separated: `playwright-mcp`, `shell`, `fs`, `http`, `chrome-devtools`, `dart-mcp` (default `fs,http`)                                                                                                                                                         |\n| `--dart-project-root \u003curi\u003e`                  | Dart/Flutter project root (`file:` URI or absolute path) when using `dart-mcp`                                                                                                                                                                                      |\n| `--dart-driver-device \u003cid\u003e`                  | Optional device id for `launch_app` preflight (driver-style runs)                                                                                                                                                                                                   |\n| `--out \u003cdir\u003e`                                | Output root (default `.verifier`)                                                                                                                                                                                                                                  |\n| `--policy \u003cname\u003e`                            | `read_only` or `ui_only`                                                                                                                                                                                                                                           |\n| `--llm-provider \u003cp\u003e`                         | `ollama`, `remote`, or `none` (default `ollama`). **`none`** turns off all four roles. **`remote`** is not fully selectable from flags alone—put per-role **`remote*`** fields in **`checkirai.config.json`**; the CLI does not pass API keys on the command line. |\n| `--ollama-host \u003curl\u003e`                        | Ollama HTTP API base URL (default `http://127.0.0.1:11434`); merged into policy for Ollama roles.                                                                                                                                                                  |\n| `--ollama-model \u003cname\u003e`                      | When set, overrides the **`model`** tag for **all** roles that use Ollama after config merge; omit to keep per-role models from **`checkirai.config.json`** / code defaults.                                                                                       |\n| `--allow-auto-pull` / `--no-allow-auto-pull` | Allow pulling missing Ollama models                                                                                                                                                                                                                                |\n| `--restart-from \u003cphase\u003e`                     | `start` · `spec_ir` · `llm_plan`                                                                                                                                                                                                                                   |\n| `--restart-run \u003crunId\u003e`                      | Parent run UUID when restarting                                                                                                                                                                                                                                    |\n\n**Exit codes:** `0` pass · `1` fail · `2` inconclusive · `3` blocked.\n\n### `checkirai ollama status`\n\nCheck that Ollama is reachable.\n\n| Option         | Default                  |\n| -------------- | ------------------------ |\n| `--host \u003curl\u003e` | `http://127.0.0.1:11434` |\n\n### `checkirai model list`\n\nList installed Ollama models.\n\n| Option         | Default                  |\n| -------------- | ------------------------ |\n| `--host \u003curl\u003e` | `http://127.0.0.1:11434` |\n\n### `checkirai model suggest`\n\nPrint recommended models (structured / tool-friendly output).\n\n| Option                       | Default                                    |\n| ---------------------------- | ------------------------------------------ |\n| `--tooling` / `--no-tooling` | Prefer tooling-capable models (default on) |\n\n### `checkirai model pull \u003cmodelName\u003e`\n\nDownload a model via Ollama’s HTTP API (e.g. `llama3.1:8b-instruct`).\n\n| Option         | Default                  |\n| -------------- | ------------------------ |\n| `--host \u003curl\u003e` | `http://127.0.0.1:11434` |\n\n### `checkirai chrome-devtools list-tools`\n\nSpawn a Chrome DevTools MCP server process and log the tools it exposes.\n\n| Option            | Description                                        |\n| ----------------- | -------------------------------------------------- |\n| `--command \u003ccmd\u003e` | **Required** — executable to launch the MCP server |\n| `--args \u003cargs\u003e`   | Space-separated arguments (optional)               |\n| `--cwd \u003ccwd\u003e`     | Working directory (default: current directory)     |\n\n### `checkirai chrome-devtools self-check`\n\nVerify the Chrome DevTools MCP server exposes the expected tool surface.\n\n| Option            | Description  |\n| ----------------- | ------------ |\n| `--command \u003ccmd\u003e` | **Required** |\n| `--args \u003cargs\u003e`   | Optional     |\n| `--cwd \u003ccwd\u003e`     | Optional     |\n\n### `checkirai dart-mcp list-tools`\n\nSpawn the Dart/Flutter MCP server process and log the tools it exposes.\n\n| Option            | Description                                        |\n| ----------------- | -------------------------------------------------- |\n| `--command \u003ccmd\u003e` | **Required** — executable to launch the MCP server |\n| `--args \u003cargs\u003e`   | Space-separated arguments (optional)               |\n| `--cwd \u003ccwd\u003e`     | Working directory (default: current directory)     |\n\n### `checkirai dart-mcp self-check`\n\nVerify the Dart MCP server exposes the expected tool surface.\n\n| Option            | Description  |\n| ----------------- | ------------ |\n| `--command \u003ccmd\u003e` | **Required** |\n| `--args \u003cargs\u003e`   | Optional     |\n| `--cwd \u003ccwd\u003e`     | Optional     |\n\n---\n\n## MCP server and Cursor\n\nCheckir AI exposes an **MCP server** (stdio) so editors and agents can call verification as tools instead of shelling out.\n\n- **Implementation:** `src/interfaces/mcp/server.ts` (`startMcpServer()`)\n- **Tools:** verification (`verify_spec`, `restart_verify_spec`, `suggest_probe_plan`, `list_capabilities`), run inspection (`get_report`, `get_run_graph`, `get_artifact`, `explain_failure`), and Ollama helpers (`ollama_status`, `model_list`, `model_suggest`, `model_pull`, `model_ensure`)\n\nStart the server locally (stdio):\n\n```bash\npnpm mcp\n```\n\nOptional: set `CHECKIRAI_OUT` to override the verifier output root (default `.verifier`).\n\n**Cursor:** register the MCP server with **`node`** and an **absolute** path to **`dist/src/interfaces/mcp/bin.js`**, plus **`cwd`** on the clone root (built by `pnpm install` / `pnpm build`); or `pnpm --silent mcp`. Do **not** use plain `pnpm mcp` in the editor (script banner on stdout breaks MCP stdio). If `args` use a relative `dist/...` path and you see **`Cannot find module '/Users/…/dist/...'`** (home or wrong prefix), Cursor did not apply `cwd` to the child—switch the script path in `args` to an absolute path. If you use `--import tsx` and see **`ERR_MODULE_NOT_FOUND` for `tsx`**, fix `cwd` or install deps in that clone. See **[docs/USAGE.md](docs/USAGE.md)** for JSON snippets and `verify_spec` examples.\n\nFor end-to-end examples, probe output layout, and integration notes, **`docs/USAGE.md`** is the detailed guide.\n\n---\n\n## Development scripts\n\n| Script                            | Purpose                       |\n| --------------------------------- | ----------------------------- |\n| `pnpm build`                      | Compile TypeScript to `dist/` |\n| `pnpm dev`                        | Run CLI via tsx (`--help`)    |\n| `pnpm typecheck`                  | `tsc --noEmit`                |\n| `pnpm test`                       | Vitest                        |\n| `pnpm lint` / `pnpm lint:fix`     | Biome                         |\n| `pnpm format` / `pnpm format:fix` | Biome formatter               |\n| `pnpm mcp`                        | MCP server (stdio)            |\n\n---\n\n## Architecture overview\n\nEnd-to-end, a run is a **pipeline** from natural-language intent to a frozen result. An LLM (**Ollama** by default, or **`remote`**) is used where structure and judgment are needed; deterministic code handles orchestration, policies, and parts of scoring.\n\n1. **Spec in** — Markdown file, **Spec bundle** (inline markdown + URLs + files resolved to text), or a pre-built **Spec IR** object.\n2. **Normalize → Spec IR** — The configured LLM turns prose into a structured intermediate representation: requirements, observables, and metadata the rest of the system consumes. Outputs are **persisted** (e.g. `spec_ir` artifacts) so a run is auditable and replayable.\n3. **Plan → test plan** — The planner consults the **capability graph** for your `--tools` set (HTTP, filesystem, shell, Playwright / Chrome DevTools MCP, …). Verifier **capabilities** (e.g. `navigate`, `read_ui_structure`, `run_command`, `call_http`) are the atomic actions probes may request; see **`src/capabilities/types.ts`** (`ALL_CAPABILITY_NAMES`). An LLM (and/or procedural planners) produces executable steps aligned with what is actually available.\n4. **Execute** — The executor **bootstraps navigation** to the run’s target URL when Chrome + **`navigate`** are available (so snapshots are not taken against whatever tab was already open). Between probes it can **reset** to that URL again to shed UI mutations; optional **`isolateProbeSessions`** uses one session per probe. **`run_command`** is allowlist-gated (default deny if the list is empty) and rejects shell **metacharacters** unless **`defaults.allowShellMetacharacters`** is true. Optional **timeouts** and **step retries** cap hung or flaky work.\n5. **Judge, triage \u0026 synthesize** — **Deterministic checks** (including more observable kinds, URL from the page, HTTP evidence where present) and **LLM judges** (per-role Ollama or **remote**) assign per-requirement verdicts (`pass` / `fail` / `inconclusive` / `blocked`). Optional **post-run triage** uses the **`triage`** role. Optional **`depends_on`** on a requirement blocks dependents when a prerequisite fails. The runtime emits `report.json`, `summary.md`, and related rows for the dashboard and MCP tools.\n\n```mermaid\nflowchart LR\n  MD[Markdown or bundle] --\u003e N[LLM to Spec IR]\n  IR[Spec IR object] --\u003e P[Plan using tools]\n  N --\u003e P\n  P --\u003e X[Execute tools]\n  X --\u003e J[Judge]\n  J --\u003e S[Report and summary]\n```\n\n### Starting from a checkpoint (“phases”)\n\nYou do **not** have to redo every expensive step. A parent run stores artifacts; a child run can **restart from a saved phase** by passing **`--restart-run`** with the parent’s run id and **`--restart-from`**:\n\n| `--restart-from` | Meaning                                                                                                                                                                                                                                                                                              |\n| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| **`start`**      | Full pipeline from spec input (default).                                                                                                                                                                                                                                                             |\n| **`spec_ir`**    | Reuse the parent’s **frozen Spec IR**—skip normalization LLM work; continue with planning and later stages.                                                                                                                                                                                          |\n| **`llm_plan`**   | Reuse the parent’s **saved test-plan artifact**—skip normalization and the main planning phase; continue with execution and judgement. Requires the same kind of setup as a full **MCP + LLM** generic loop (e.g. `chrome-devtools` or `dart-mcp` in `--tools` and an LLM provider other than `none`). |\n\nThe same **`restartFromPhase` / `restartFromRunId`** fields exist on **`verify_spec`** over MCP and on the web API; over MCP you can also call **`restart_verify_spec`** with **`parentRunId`** (and optional overrides). Pick the phase that matches how much of the parent run you want to reuse when iterating on plans, tooling, or judges.\n\n---\n\n## Status and contributing\n\nThis project is **work in progress**: behavior and APIs evolve **almost daily** as probes, judges and MCP integrations mature. If something is rough or undocumented, that is expected for now.\n\n**Contributions are welcome** — issues, specs, probe ideas, and PRs that tighten verification or docs all help. For deeper context on the current MVP scope (including known limitations) read **[docs/USAGE.md](docs/USAGE.md)**.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglobulus%2Fcheckirai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fglobulus%2Fcheckirai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglobulus%2Fcheckirai/lists"}