{"id":40006003,"url":"https://github.com/sauravvenkat/forkline","last_synced_at":"2026-02-28T02:26:59.606Z","repository":{"id":332689041,"uuid":"1134607413","full_name":"sauravvenkat/forkline","owner":"sauravvenkat","description":"Forkline is a replay-first tracing and diffing library for agentic AI workflows that lets you deterministically reproduce, fork, and compare agent runs to find exactly where behavior diverged.","archived":false,"fork":false,"pushed_at":"2026-02-21T17:48:05.000Z","size":163,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-21T23:25:52.702Z","etag":null,"topics":["agentic-ai","ai-infrastructure","cli-tools","deterministic-replay","developer-tools","llm-debugging","ml-infrastructure","trace-diffing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sauravvenkat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-15T00:24:14.000Z","updated_at":"2026-02-21T17:44:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sauravvenkat/forkline","commit_stats":null,"previous_names":["sauravvenkat/forkline"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/sauravvenkat/forkline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sauravvenkat%2Fforkline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sauravvenkat%2Fforkline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sauravvenkat%2Fforkline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sauravvenkat%2Fforkline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sauravvenkat","download_url":"https://codeload.github.com/sauravvenkat/forkline/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sauravvenkat%2Fforkline/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29736236,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-23T02:24:00.660Z","status":"ssl_error","status_checked_at":"2026-02-23T02:22:56.087Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","ai-infrastructure","cli-tools","deterministic-replay","developer-tools","llm-debugging","ml-infrastructure","trace-diffing"],"created_at":"2026-01-19T02:05:44.492Z","updated_at":"2026-02-23T03:24:57.909Z","avatar_url":"https://github.com/sauravvenkat.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/assets/forkline-wordmark.svg\" alt=\"Forkline\" max-width=\"100%\"/\u003e\n\u003c/p\u003e\n\n**Forkline** is a **local-first, replay-first tracing and diffing library for agentic AI workflows**.\n\nIts purpose is simple and strict:\n\n\u003e **Make agent runs reproducible, inspectable, and diffable.**\n\nForkline treats nondeterminism as something to be **controlled**, not merely observed.\n\n---\n\n## Why Forkline exists\n\nModern agentic systems fail in a frustrating way:\n\n- The same prompt behaves differently on different days\n- Tool calls change silently\n- Debugging becomes guesswork\n- CI becomes flaky or meaningless\n\nLogs and dashboards tell you *that* something changed.  \nForkline is built to tell you **where**, **when**, and **why**.\n\n---\n\n## What Forkline does\n\nForkline allows you to:\n\n- **Record** an agent run as a deterministic, local artifact\n- **Replay** that run without re-invoking the LLM ✅\n- **Diff** two runs and detect the **first point of divergence** ✅\n- **Capture tool calls** safely with deterministic redaction\n- **Use agent workflows in CI** without network calls or flakiness\n\nThis turns agent behavior into something you can reason about like code.\n\n---\n\n## Replay (Deterministic)\n\nReplay in Forkline means:\n\n- **Offline execution** — No network calls, no LLM invocations during replay\n- **Artifact injection** — Tool and LLM outputs come from recorded artifacts, not live calls\n- **First-divergence detection** — Comparison halts at the first observable difference\n- **Read-only** — Replay never mutates the original recording\n- **Deterministic** — Same inputs always produce identical comparison results\n\n```python\nfrom forkline import SQLiteStore, ReplayEngine, ReplayStatus\n\nstore = SQLiteStore()\nengine = ReplayEngine(store)\n\n# Record a run (see docs/RECORDING_V0.md)\n# ...\n\n# Compare two recorded runs\nresult = engine.compare_runs(\"baseline-run\", \"current-run\")\n\nif result.status == ReplayStatus.MATCH:\n    print(\"Runs are identical\")\nelif result.status == ReplayStatus.DIVERGED:\n    print(f\"Diverged at step {result.divergence.step_idx}: {result.divergence.divergence_type}\")\n```\n\nSee [`docs/REPLAY_ENGINE_V0.md`](docs/REPLAY_ENGINE_V0.md) for full replay documentation.\n\n---\n\n## Quick Start\n\n```bash\n# Install (editable)\npip install -e .\n\n# Run a script under forkline tracing\nforkline run examples/minimal.py\n\n# List recorded runs\nforkline list\n\n# Replay a run (prints summary)\nforkline replay \u003crun_id\u003e\n\n# Diff two runs\nforkline diff \u003crun_id_a\u003e \u003crun_id_b\u003e\n```\n\n### CLI Reference\n\n```bash\n# Run a script and capture metadata (timestamps, exit code, script path)\nforkline run examples/minimal.py\n# =\u003e run_id: 8a3f...\n\n# Pass arguments to the script (use -- to separate)\nforkline run examples/minimal.py -- --verbose --count 5\n\n# List runs (newest first, table format)\nforkline list\nforkline list --limit 10\nforkline list --json\n\n# Replay a run (load and summarize events)\nforkline replay \u003crun_id\u003e\nforkline replay \u003crun_id\u003e --json\n\n# Diff two runs (finds first divergence)\nforkline diff \u003crun_id_a\u003e \u003crun_id_b\u003e\nforkline diff \u003crun_id_a\u003e \u003crun_id_b\u003e --format json\n\n# Use a custom database path\nforkline run --db myproject.db examples/minimal.py\nforkline list --db myproject.db\n```\n\n### Example: catching LLM nondeterminism with Ollama Qwen3\n\n`examples/ollama_qwen3.py` calls Ollama's Qwen3 model and records the\ninput/output as forkline events. Run it twice — the LLM gives a different\nresponse each time, and `forkline diff` catches it.\n\n```bash\n# Prerequisites: ollama pull qwen3\n\n$ forkline run examples/ollama_qwen3.py\nCalling qwen3 ...\nResponse: A fork bomb is a denial-of-service attack that recursively spawns\nan infinite number of processes to exhaust system resources, causing a crash\nor severe performance degradation.\nrun_id: b015f49f45c04002a3c489fe84b45c5c\n\n$ forkline run examples/ollama_qwen3.py\nCalling qwen3 ...\nResponse: A fork bomb is a type of denial-of-service attack that recursively\nspawns an infinite number of processes using the fork() system call, thereby\nexhausting system resources and causing the system to crash or become\nunresponsive.\nrun_id: 7b08ac5e533d456daa7a24921c0d1687\n```\n\n**`forkline list`** — both runs, newest first:\n\n```\nID                                    Created               Script                          Status\n------------------------------------------------------------------------------------------------------\n7b08ac5e533d456daa7a24921c0d1687      2026-02-23 01:04:34   examples/ollama_qwen3.py        ok\nb015f49f45c04002a3c489fe84b45c5c      2026-02-23 01:04:20   examples/ollama_qwen3.py        ok\n```\n\n**`forkline replay b015f4...`** — summary of the first run:\n\n```\nRun: b015f49f45c04002a3c489fe84b45c5c\nScript: examples/ollama_qwen3.py\nStatus: ok\nDuration: 10.74s\nTotal events: 2\nEvents by type:\n  input: 1\n  output: 1\n```\n\n**`forkline diff b015f4... 7b08ac...`** — nondeterminism caught:\n\n```\nStep 1 diverged:\n  old.type: output\n  old.payload: {\"model\": \"qwen3\", \"response\": \"A fork bomb is a denial-of-service attack tha...\n  new.type: output\n  new.payload: {\"model\": \"qwen3\", \"response\": \"A fork bomb is a type of denial-of-service at...\n```\n\nSame prompt, same model — different output. That's exactly the problem Forkline exists to surface.\n\n### Programmatic API\n\n```python\nfrom forkline import ReplayEngine, SQLiteStore, ReplayStatus\n\nengine = ReplayEngine(SQLiteStore())\nresult = engine.compare_runs(\"baseline-run\", \"new-run\")\n\nif result.is_match():\n    print(\"No behavioral changes\")\nelse:\n    print(f\"Diverged: {result.divergence.summary()}\")\n```\n\nSee [`QUICKSTART_RECORDING_V0.md`](docs/QUICKSTART_RECORDING_V0.md) for recording and [`REPLAY_ENGINE_V0.md`](docs/REPLAY_ENGINE_V0.md) for replay.\n\n---\n\n## Design principles\n\nForkline is intentionally opinionated.\n\n- **Replay-first, not dashboards-first**\n- **Determinism over probabilistic insight**\n- **Local-first artifacts**\n- **Diff over metrics**\n- **Explicit schemas over implicit behavior**\n\nIf a feature does not help reproduce, replay, or diff an agent run, it does not belong in Forkline.\n\n---\n\n## Security \u0026 Data Redaction\n\nForkline is designed to be **safe by default** when handling sensitive data.\n\n### Core invariant\n\n\u003e **By default, Forkline artifacts MUST NOT contain recoverable sensitive user, customer, or proprietary data.**\n\nThis means:\n- **No raw LLM prompts or responses** are persisted by default\n- **Secrets are NEVER written to disk** in any mode\n- **PII and customer data** are redacted before persistence\n- **Redaction happens at capture time**, before any disk write\n\n### What IS recorded (SAFE mode)\n\nForkline preserves everything needed for replay and diffing:\n- Step ordering and control flow\n- Tool and model identifiers\n- Timestamps and execution metadata\n- **Stable cryptographic hashes** of redacted values\n- Structural shape of inputs/outputs\n\nThis enables deterministic replay, accurate diffing, and forensic debugging — without exposing sensitive data.\n\n### Escalation modes\n\nFor development and debugging, Forkline supports explicit opt-in modes:\n- **SAFE** (default): Production-safe, full redaction\n- **DEBUG**: Local development, raw values persisted\n- **ENCRYPTED_DEBUG**: Encrypted payloads for break-glass production debugging\n\n### Full policy\n\nFor the complete security design and redaction mechanisms, see:\n\n👉 [`docs/REDACTION_POLICY.md`](docs/REDACTION_POLICY.md)\n\n---\n\n## Why CLI-first\n\nForkline is **CLI-first by design**, not by convenience.\n\nAgent debugging and reproducibility are **developer workflows**.  \nThey live in terminals, CI pipelines, local machines, and code reviews — not dashboards.\n\n### Determinism and scriptability\nCLI commands are composable, automatable, and repeatable.\n\nThis makes Forkline usable in:\n- CI pipelines\n- test suites\n- local debugging loops\n- regression checks\n\nIf it can’t be scripted, it can’t be trusted as infrastructure.\n\n---\n\n### Local-first by default\nA CLI enforces Forkline’s local-first philosophy:\n- artifacts live on disk\n- runs replay offline\n- no hidden network dependencies\n- no opaque browser state\n\nThis keeps behavior inspectable and failure modes obvious.\n\n---\n\n### Diff is terminal-native\nDiffing is already how developers reason about change:\n- `git diff`\n- `pytest` failures\n- compiler diagnostics\n- performance regressions\n\nForkline extends this mental model to agent behavior.\n\nA CLI makes Forkline additive to existing tooling, not a replacement.\n\n---\n\n### Avoiding dashboard gravity\nDashboards optimize for:\n- aggregation over root cause\n- real-time metrics over replayability\n- visualization over determinism\n\nForkline explicitly avoids this gravity.\n\nIf a feature requires a UI to be understandable, it is usually hiding complexity rather than exposing truth.\n\n---\n\n### UIs can come later — CLIs must come first\nForkline does not reject UIs.  \nIt rejects **UI-first design**.\n\nThe CLI defines the real API surface and semantic contract.\nAny future UI must be a thin layer on top — never the other way around.\n\n\u003e Forkline is CLI-first because reproducibility, diffing, and trust are terminal-native problems.\n\n---\n\n## First-Divergence Diffing\n\nForkline can compare two recorded runs and identify the **first point of divergence** with deterministic classification, structured diffs, and a resync window that handles inserted/deleted steps.\n\n### CLI Usage\n\n```bash\n# Pretty diff (default)\nforkline diff run_a_id run_b_id\n\n# JSON diff\nforkline diff run_a_id run_b_id --format json\n\n# Custom database path\nforkline diff run_a_id run_b_id --db myproject.db\n```\n\n### Programmatic Usage\n\n```python\nfrom forkline import SQLiteStore\nfrom forkline.core.first_divergence import find_first_divergence, DivergenceType\n\nstore = SQLiteStore()\nrun_a = store.load_run(\"baseline\")\nrun_b = store.load_run(\"current\")\n\nresult = find_first_divergence(run_a, run_b)\n\nif result.status == DivergenceType.EXACT_MATCH:\n    print(\"Runs are identical\")\nelse:\n    print(f\"Diverged: {result.explanation}\")\n    print(f\"  Type: {result.status}\")\n    print(f\"  At: step {result.idx_a} (run_a) / step {result.idx_b} (run_b)\")\n    if result.output_diff:\n        for op in result.output_diff:\n            print(f\"  {op['op']} {op['path']}\")\n```\n\n### Sample Output\n\n```\nFirst divergence: output_divergence\n  Step 2 'generate_response': output differs (same input)\n\n  Run A step 2 'generate_response':\n    input_hash:  a1b2c3d4e5f6a7b8...\n    output_hash: 1234567890abcdef...\n    events: 3\n    has_error: False\n\n  Run B step 2 'generate_response':\n    input_hash:  a1b2c3d4e5f6a7b8...\n    output_hash: fedcba0987654321...\n    events: 3\n    has_error: False\n\n  Output diff:\n    replace $.result.text: \"Expected response\" -\u003e \"Different response\"\n\n  Last equal: step 1\n  Context A: [step 0 'init', step 1 'prepare', step 2 'generate_response']\n  Context B: [step 0 'init', step 1 'prepare', step 2 'generate_response']\n```\n\n### Divergence Types\n\n| Type | Meaning |\n|------|---------|\n| `exact_match` | Runs are identical |\n| `input_divergence` | Same step name, different input |\n| `output_divergence` | Same step name and input, different output |\n| `op_divergence` | Step names differ at same position |\n| `missing_steps` | Steps in run_a not present in run_b |\n| `extra_steps` | Steps in run_b not present in run_a |\n| `error_divergence` | Error state differs between steps |\n\n### How Resync Works\n\nWhen a mismatch is found, the engine searches within a configurable window (default 10 steps) for matching \"soft signatures\" `(step_name, input_hash)`. This correctly identifies inserted or deleted steps rather than reporting every subsequent step as divergent.\n\n---\n\n## What Forkline is NOT\n\nForkline explicitly does **not** aim to be:\n\n- **OpenTelemetry or distributed tracing** — No spans, traces, or exporters\n- **Production observability** — Not for real-time monitoring or alerting\n- **An evaluation or benchmarking framework** — Not for scoring or ranking models\n- **Prompt engineering tooling** — Not for A/B testing or prompt optimization\n- **A hosted SaaS or dashboard product** — Local-first, no cloud dependencies\n\nForkline is offline forensic debugging infrastructure, not an analytics or observability platform.\n\nFor recording schema details, see [`docs/RECORDING_V0.md`](docs/RECORDING_V0.md).\n\n---\n\n## Roadmap\n\nForkline follows a disciplined, execution-first roadmap.\n\nThe v0 series focuses on **correctness and determinism**, not polish.\n\n1. ✅ Deterministic run recording  \n2. ✅ Offline replay engine  \n3. ✅ First-divergence diffing  \n4. ✅ CLI (`run`, `list`, `replay`, `diff`)  \n5. CI-friendly deterministic mode  \n\nThe canonical roadmap and design contract live here:\n\n👉 [`docs/ROADMAP.md`](docs/ROADMAP.md)\n\n---\n\n## Status\n\nForkline is **early-stage and under active development**.\n\nAPIs are expected to change until `v1.0`.  \nFeedback is welcome, especially around replay semantics and diffing behavior.\n\n---\n\n## License\n\nForkline is licensed under the **Apache 2.0 License**.\n\n---\n\n## Philosophy (one sentence)\n\n\u003e Forkline exists because “it changed” is not a useful debugging answer.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsauravvenkat%2Fforkline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsauravvenkat%2Fforkline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsauravvenkat%2Fforkline/lists"}