{"id":48880236,"url":"https://github.com/varaddurge/argus","last_synced_at":"2026-06-15T14:00:30.831Z","repository":{"id":349740135,"uuid":"1199126891","full_name":"VaradDurge/ARGUS","owner":"VaradDurge","description":"CLI observability and debugging for agent workflows — catch silent/semantic failures, trace root causes, and replay from any step.   98.8% root cause accuracy across 100 controlled scenarios","archived":false,"fork":false,"pushed_at":"2026-06-11T09:32:00.000Z","size":6602,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-11T10:08:28.061Z","etag":null,"topics":["agent-workflows","ai-agents","cli","debugging","observability","replay"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VaradDurge.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-02T04:23:10.000Z","updated_at":"2026-06-11T09:32:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/VaradDurge/ARGUS","commit_stats":null,"previous_names":["varaddurge/argus"],"tags_count":27,"template":false,"template_full_name":null,"purl":"pkg:github/VaradDurge/ARGUS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VaradDurge%2FARGUS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VaradDurge%2FARGUS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VaradDurge%2FARGUS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VaradDurge%2FARGUS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VaradDurge","download_url":"https://codeload.github.com/VaradDurge/ARGUS/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VaradDurge%2FARGUS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34365597,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-workflows","ai-agents","cli","debugging","observability","replay"],"created_at":"2026-04-16T02:30:30.595Z","updated_at":"2026-06-15T14:00:30.760Z","avatar_url":"https://github.com/VaradDurge.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/VaradDurge/ARGUS/blob/master/assets/Argus-NameTrans.png?raw=true\" width=\"480\"/\u003e\u003cbr/\u003e\n  \u003ca href=\"https://arguslabs.in\"\u003e\u003cimg src=\"https://img.shields.io/badge/website-arguslabs.in-6366f1\" alt=\"Website\"/\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/argus-agents/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/argus-agents\" alt=\"PyPI version\"/\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/argus-agents/\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.9%2B-blue\" alt=\"Python 3.9+\"/\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/VaradDurge/ARGUS/releases/tag/v0.5.0\"\u003e\u003cimg src=\"https://img.shields.io/badge/status-beta-6366f1\" alt=\"Beta\"/\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n---\n\n**Production readiness platform for AI agent pipelines.**\n\nYour LangGraph pipeline runs. No exception. But three nodes later something crashes with a `KeyError`. The node that crashed didn't cause it — some node upstream returned a dict with a missing field, and nothing caught it.\n\nARGUS sits between your nodes and catches silent failures, semantic degradation, and contract violations before they reach production.\n\n\u003cimg src=\"https://github.com/VaradDurge/ARGUS/blob/master/assets/Argus_website.png?raw=true\" width=\"700\"/\u003e\n\n---\n\n## Install\n\n```bash\npip install argus-agents\n```\n\n## Setup — pick whichever fits your code\n\n**Option A — pass graph to constructor (recommended):**\n```python\nfrom argus import ArgusWatcher\n\nwatcher = ArgusWatcher(graph)      # attaches monitoring automatically\napp = graph.compile()\nresult = app.invoke(initial_state) # run auto-saves when the last node finishes\nprint(watcher.run_id)              # access the run ID directly\n```\n\n**Option B — separate watch call:**\n```python\nfrom argus import ArgusWatcher\n\nwatcher = ArgusWatcher()\nwatcher.watch(graph)       # before graph.compile()\napp = graph.compile()\nresult = app.invoke(initial_state)\n```\n\n**Option C — after compile (new in v0.5.0):**\n```python\nfrom argus import ArgusWatcher\n\nwatcher = ArgusWatcher()\napp = graph.compile(checkpointer=memory)\napp = watcher.watch_compiled(app)   # works on already-compiled graphs\nresult = app.invoke(initial_state)\n```\n\nAll three work. No changes to your node functions. Runs are saved automatically for linear and fan-out/fan-in graphs. Only cyclic graphs (with back-edges) need a manual `watcher.finalize()` call.\n\n### ArgusWatcher parameters\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `graph` | `StateGraph` | `None` | LangGraph graph to monitor. If passed, `watch()` is called automatically. |\n| `max_field_size` | `int` | `50_000` | Max characters per field before truncation in stored outputs. |\n| `validators` | `dict` | `None` | Per-node semantic validators. Use `\"*\"` as key to run on every node. Each validator is a `(bool, str)` callable. |\n| `strict` | `bool` | `False` | Enable extra checks: nested error keys, rate-limit responses, empty lists, type mismatches. Recommended for CI/staging. |\n| `investigate` | `bool \\| str` | `True` | LLM root-cause investigation. `True` = on failure only, `\"always\"` = every node, `False` = off. |\n| `redact_keys` | `set[str]` | `None` | Field names to redact from stored outputs (e.g. `{\"password\", \"api_key\"}`). |\n| `persist_state` | `bool` | `True` | Save run records to `.argus/runs/`. Set `False` for ephemeral monitoring. |\n| `record_http` | `bool` | `False` | Record all HTTP calls for deterministic replay. Saved to disk per run. |\n| `semantic_judge` | `bool` | `False` | Enable LLM-powered quality judge on every node output. Requires `OPENAI_API_KEY`. |\n| `judge_model` | `str` | `\"gpt-4o\"` | Model for the semantic judge and investigation. |\n\n```python\n# Example with multiple options\nwatcher = ArgusWatcher(\n    graph,\n    semantic_judge=True,\n    judge_model=\"gpt-4o-mini\",\n    strict=True,\n    record_http=True,\n    redact_keys={\"api_key\", \"token\"},\n    validators={\n        \"summarize\": lambda o: (len(o.get(\"summary\", \"\")) \u003e 10, \"Summary too short\"),\n    },\n)\n```\n\n---\n\n## What it catches\n\n**Silent failures** — a node returns `{}` or drops a required field. No exception, pipeline keeps running. ARGUS compares each node's output against the next node's type annotations and flags it before the crash happens downstream.\n\n**Semantic failures** — structure is fine but the value is wrong. Pass a validator:\n\n```python\nwatcher = ArgusWatcher(graph, validators={\n    \"classify\": lambda o: (o.get(\"label\") in [\"yes\", \"no\"], \"unexpected label\"),\n    \"*\":        lambda o: (\"error\" not in o, \"error key present\"),\n})\n```\n\n`\"*\"` runs on every node.\n\n**Crashes** — full traceback captured per node, with a one-line root cause:\n```\n└─  KeyError: 'score'\n└─  at pipeline.py:47  →  result = state[\"score\"] * weight\n└─  Field 'score' was absent from the incoming state\n```\n\n**Strict mode** — additional patterns: nested error keys, rate limit responses, empty required lists, `list[int]` vs `list[str]` type mismatches. Use in staging/CI:\n\n```python\nwatcher = ArgusWatcher(graph, strict=True)\n```\n\n---\n\n## Output\n\n```\nargus  run-abc12345  ·  2024-04-05 12:30  ·  1243 ms\nstatus  ●  silent_failure\n\n   1  fetch       43 ms    ✓  pass\n   2  validate    12 ms    ⚠  silent failure\n      └─  Field \"score\" is missing\n      └─  process received bad state\n   3  process    891 ms    ✗  crashed\n      └─  KeyError: 'score'\n      └─  Field 'score' was absent from the incoming state\n\nroot cause   validate\n```\n\nParallel nodes shown as a grouped panel. Cyclic graphs show each iteration separately. Human interrupt chains stitched into one trace on resume.\n\n---\n\n## Rerun\n\nA 10-node pipeline fails at node 7. You fix the bug. Instead of re-running nodes 1–6 and burning API credits:\n\n```bash\nargus replay \u003crun-id\u003e node_7\n```\n\nARGUS restores the exact state at node 7 from disk and runs from there. Upstream outputs stay frozen. Only node 7 onward re-executes with your fixed code.\n\nFrom the web UI — hover any step, click `↺ Rerun From Here`. After rerun, the diff view opens automatically.\n\n```bash\nargus diff \u003crerun-id\u003e    # compare rerun vs original\n```\n\n### What about external API calls?\n\nBy default, reruns call external APIs live (OpenAI, search tools, databases). Results may differ from the original run.\n\nFor **fully deterministic** reruns, record HTTP calls during the original run:\n\n```python\nwatcher = ArgusWatcher(graph, record_http=True)\n```\n\nEvery API response is saved to disk. During rerun, the recorded responses are served back — same data, zero extra cost, fully reproducible.\n\n---\n\n## Semantic Judge (LLM-powered)\n\nDeterministic checks catch ~80% of production failures (missing fields, empty results, type mismatches, placeholder outputs). For the remaining 20% — subtle quality issues like wrong tone, unhelpful responses, or outdated information — enable the semantic judge:\n\n```python\nwatcher = ArgusWatcher(graph, semantic_judge=True)\n```\n\nThe LLM judge runs **after** deterministic checks on every node. It evaluates output quality, generates causal hypotheses, and suggests debugging steps.\n\n```python\n# With a specific model\nwatcher = ArgusWatcher(graph, semantic_judge=True, judge_model=\"gpt-4o\")\n\n# Combined with HTTP recording for deterministic + intelligent monitoring\nwatcher = ArgusWatcher(graph, semantic_judge=True, record_http=True)\n```\n\nRequires `OPENAI_API_KEY` in your environment. Uses GPT-4o by default.\n\n**When to use:** complex multi-agent pipelines, customer-facing outputs, LLM-generated content where quality matters.\n\n**When to skip:** simple pipelines, CI/CD speed runs, zero-cost monitoring.\n\n---\n\n## Adaptive Learning (v0.6)\n\nARGUS learns from your runs. When the semantic judge discovers a new failure pattern, it proposes a candidate signature. You review it in the **Approvals** page (`argus ui`) and choose:\n\n- **Private** — adds to your local heuristic engine only\n- **Shared** — pushes to the cloud so every ARGUS user benefits\n\nThe heuristic engine loads from three tiers: **bundled** (ships with ARGUS) → **private** (your local patterns) → **shared** (community-contributed, synced from cloud). All three are merged and deduplicated at startup.\n\n```bash\nargus ui          # open Approvals page to review candidates\nargus login       # required for cloud sync\n```\n\nThe semantic judge also overrides heuristic false positives. If a node failed *only* due to a heuristic pattern match (no structural issues, no validator failures), the LLM reviews context and can clear the flag.\n\n---\n\n## Diagnose setup issues\n\n```bash\nargus doctor\n```\n\n```\n✓  python           Python 3.9.6\n✓  langgraph        langgraph 0.6.11\n✓  storage          312 runs stored, all healthy\n✓  replay           all 7 node functions importable for rerun\n✓  optional deps    openai (key set), dotenv\n```\n\n5 seconds to know if something is wrong — Python version, LangGraph compatibility, storage health, rerun readiness.\n\n---\n\n## CLI\n\n```\nargus list                          # all runs\nargus show last                     # most recent run\nargus show run \u003cid\u003e                 # by full id or 8-char prefix\nargus replay \u003cid\u003e \u003cnode\u003e            # re-run from a node\nargus replay \u003cid\u003e \u003cnode\u003e --only     # re-run just that one node\nargus inspect \u003cid\u003e --step \u003cnode\u003e    # raw input/output for a node\nargus diff \u003cid\u003e                     # rerun vs original\nargus diff \u003cid-a\u003e \u003cid-b\u003e            # any two runs\nargus ui                            # open web dashboard\nargus doctor                        # check your setup\nargus login                         # sync runs to cloud\n```\n\n---\n\n## Web UI\n\n```bash\nargus ui\n```\n\nOpens at `http://localhost:7842`. Serves runs from `.argus/runs/` in your current directory — no account needed.\n\nRun detail, rerun tree, side-by-side diff, LLM cost per node, AI root cause investigation.\n\n---\n\n## Node statuses\n\n| | |\n|---|---|\n| `✓` | pass |\n| `~` | pass with warnings (empty optional fields) |\n| `⚠` | silent failure (missing required fields) |\n| `⊗` | semantic fail (validator returned False) |\n| `⏸` | interrupted (human-in-the-loop pause) |\n| `✗` | crashed |\n\n---\n\n## Without LangGraph\n\n```python\nfrom argus import ArgusSession\n\nsession = ArgusSession()\nsession.set_edges({\"fetch\": [\"classify\"], \"classify\": [\"process\"]})\n\nfetch    = session.wrap(\"fetch\",    fetch_fn)\nclassify = session.wrap(\"classify\", classify_fn)\nprocess  = session.wrap(\"process\",  process_fn)\n\nstate = fetch(initial_state)\nstate = classify(state)\nstate = process(state)\nsession.finalize()\n```\n\nWorks with Prefect, Temporal, or plain Python functions.\n\n---\n\nRequires Python 3.9+. LangGraph 0.2+ only needed for `ArgusWatcher`.\n\n**v0.6.2** — [changelog](https://github.com/VaradDurge/ARGUS/releases/tag/v0.6.2)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvaraddurge%2Fargus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvaraddurge%2Fargus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvaraddurge%2Fargus/lists"}