{"id":50847769,"url":"https://github.com/taimoorkhan10/replayd","last_synced_at":"2026-06-14T11:03:57.775Z","repository":{"id":361390365,"uuid":"1254262014","full_name":"TaimoorKhan10/replayd","owner":"TaimoorKhan10","description":"Turn failed AI agent runs into replayable regression tests. Catch regressions before you ship.","archived":false,"fork":false,"pushed_at":"2026-05-30T12:24:54.000Z","size":20,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-30T13:13:51.165Z","etag":null,"topics":["agent-ops","agent-testing","ai-agents","ai-infrastructure","ai-reliability","llm-ops","llm-testing","open-source","prompt-testing","python","regression-testing","release-control","replay-testing","sdk"],"latest_commit_sha":null,"homepage":"https://www.stonepathlab.net/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TaimoorKhan10.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-30T10:49:30.000Z","updated_at":"2026-05-30T12:24:58.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/TaimoorKhan10/replayd","commit_stats":null,"previous_names":["taimoorkhan10/replayd"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/TaimoorKhan10/replayd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2Freplayd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2Freplayd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2Freplayd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2Freplayd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TaimoorKhan10","download_url":"https://codeload.github.com/TaimoorKhan10/replayd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2Freplayd/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34318526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-ops","agent-testing","ai-agents","ai-infrastructure","ai-reliability","llm-ops","llm-testing","open-source","prompt-testing","python","regression-testing","release-control","replay-testing","sdk"],"created_at":"2026-06-14T11:03:51.135Z","updated_at":"2026-06-14T11:03:57.766Z","avatar_url":"https://github.com/TaimoorKhan10.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/banner.png\" alt=\"replayd — The same AI failure should not happen twice\" width=\"100%\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/replayd/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/replayd?color=C08A3E\u0026label=pypi\" alt=\"PyPI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/replayd/\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.10%2B-blue\" alt=\"Python\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-green.svg\" alt=\"MIT License\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/TaimoorKhan10/replayd/graphs/contributors\"\u003e\u003cimg src=\"https://img.shields.io/github/contributors/TaimoorKhan10/replayd?color=3D7A5C\" alt=\"Contributors\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/TaimoorKhan10/replayd/actions/workflows/tests.yml\"\u003e\u003cimg src=\"https://github.com/TaimoorKhan10/replayd/actions/workflows/tests.yml/badge.svg\" alt=\"Tests\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/TaimoorKhan10/replayd/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/TaimoorKhan10/replayd?style=social\" alt=\"Stars\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eYou fixed that agent bug last week. It came back today.\u003c/strong\u003e\u003cbr\u003e\n  replayd makes sure that never happens again.\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ccode\u003epip install replayd\u003c/code\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/TaimoorKhan10/replayd/main/assets/replayd-flow.svg\" alt=\"replayd: failed run → capture → save as test → replay on change → PASS or FAIL\" width=\"860\"\u003e\n\u003c/p\u003e\n\n## Table of contents\n\n- [The problem](#the-problem)\n- [Who is replayd for](#who-is-replayd-for)\n- [How it works](#how-it-works)\n- [Quickstart](#quickstart)\n- [See it working](#see-it-working)\n- [Why replayd](#why-replayd)\n- [Framework integrations](#framework-integrations)\n- [How replayd compares](#how-replayd-compares)\n- [Example agents](#example-agents)\n- [Recording tool calls](#recording-tool-calls)\n- [Auto-instrumentation limitations](#auto-instrumentation-limitations)\n- [Grading](#grading)\n- [Storage](#storage)\n- [CI integration](#ci-integration)\n- [Design principles](#design-principles)\n- [Roadmap](#roadmap)\n- [FAQ](#faq)\n- [What replayd is not](#what-replayd-is-not)\n- [What builders say](#what-builders-say)\n- [Star goals](#star-goals)\n- [Part of TAQ by Stonepath Labs](#part-of-taq-by-stonepath-labs)\n- [Contributing](#contributing)\n- [Star history](#star-history)\n\n---\n\n## The problem\n\n| | Without replayd | With replayd |\n|---|---|---|\n| Agent fails in production | Fixed manually, forgotten | Saved as a replayable regression test |\n| You change a prompt or model | Hope the old failure does not return | Replay proves it cannot return |\n| Same bug comes back | Users catch it | Release is blocked before deploy |\n\n---\n\n## Who is replayd for\n\nreplayd is for teams shipping agents that can fail in ways they cannot afford to repeat:\n\n- customer support and refund approval agents\n- tool-calling and function-calling agents\n- RAG and retrieval agents\n- internal workflow and orchestration agents\n- coding, browser, and planning agents\n\nIf your agent can fail in a way you do not want repeated, replayd turns that failure into a test.\n\n---\n\n## How it works\n\nreplayd has three concepts: **Capture**, **Grade**, and **Gate**.\n\n```\n  YOUR AGENT FAILS IN PRODUCTION\n              │\n              ▼\n    ┌─────────────────────┐\n    │       CAPTURE       │  rp.capture() wraps the run.\n    │                     │  Records tool calls, input,\n    │   rp.capture(...)   │  output, and model used.\n    └──────────┬──────────┘\n               │\n               ▼\n    ┌─────────────────────┐\n    │        GRADE        │  You define what \"wrong\" means:\n    │                     │  forbidden_actions, expected_action,\n    │   rp.save_test(...) │  or a grader_prompt for semantic eval.\n    └──────────┬──────────┘\n               │\n               ▼\n    ┌─────────────────────┐\n    │        GATE         │  On every future change, replayd\n    │                     │  replays the test. If the failure\n    │  rp.replay_all(...) │  returns, the deploy is blocked.\n    └─────────────────────┘\n\n  One failed run → one saved test → no regression ever ships.\n```\n\nThat is the entire model. No new abstractions. No configuration DSL. Capture the real failure, define what wrong looks like, replay it before every ship.\n\n---\n\n## Quickstart\n\n```python\nfrom replayd import Replayd\n\nrp = Replayd()\n\n# 1. Capture a run — assign run.output inside the block\nwith rp.capture(input=user_input, model=\"gpt-4o\") as run:\n    run.output = your_agent.run(user_input)\n\n# Note: wrap your agent to record tool calls — see \"Recording tool calls\" below\n\n# 2. Mark it as failed\nrp.mark_failed(run.id, reason=\"agent approved refund after policy limit\")\n\n# 3. Save as a regression test\nrp.save_test(\n    run.id,\n    forbidden_actions=[\"approve_refund\"],\n    expected_action=\"escalate\",\n)\n\n# 4. Later — after changing your prompt or model — replay all tests\nresults = rp.replay_all(agent=your_agent_fn)\n\nfor r in results:\n    print(r.verdict, r.reason)\n```\n\n---\n\n## See it working\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/TaimoorKhan10/replayd/main/assets/demo.svg\" alt=\"replayd terminal demo\" width=\"860\"\u003e\n\u003c/p\u003e\n\nRun the included example (`python examples/basic_example.py`) and you get:\n\n```\nCapturing a refund-approval agent run...\n  agent called: approve_refund(amount=1200)  [policy limit is $500]\n  output: {'action': 'approve_refund', 'amount': 1200}\n\nMarking run as failed...\n  reason: agent approved refund of $1200, exceeding $500 policy limit\n\nSaving as regression test...\n  forbidden: approve_refund  |  expected: escalate\n\n-----------------------------------------\nReplay #1 -- buggy agent (regression should be caught)\n  [FAIL] Forbidden action 'approve_refund' was called during replay.\n\nReplay #2 -- fixed agent (regression should be resolved)\n  [PASS] No forbidden actions called; all expected actions present.\n-----------------------------------------\n1 failure caught. 1 resolved.\n```\n\nThe failure was captured, saved, replayed against a broken agent (FAIL), and replayed again against the fixed agent (PASS). That is the full loop.\n\n---\n\n## Why replayd\n\nAI agents do not only fail once. They regress. You change a prompt, a model, a tool schema, or a retrieval setup, and something that used to work quietly breaks again. Traditional software has regression tests and CI/CD to catch this. AI agents have had nothing equivalent.\n\nreplayd is the open source fix. Failed runs become replayable tests. Old failures cannot return undetected.\n\n---\n\n## Framework integrations\n\nreplayd is framework-agnostic. The capture-and-replay pattern works with any agent that can be wrapped as a Python callable.\n\n| Framework | Status | Example |\n|---|---|---|\n| Plain Python | ✅ Ready | `examples/basic_example.py` |\n| LangChain | ✅ Ready | `examples/langchain_tool_agent.py` |\n| OpenAI Agents SDK | ✅ Ready | `examples/openai_agents_sdk_example.py` |\n| CrewAI | 🔜 Planned | Contributions welcome |\n| AutoGen | 🔜 Planned | Contributions welcome |\n| LlamaIndex | 🔜 Planned | Contributions welcome |\n| DSPy | 🔜 Planned | Contributions welcome |\n| Semantic Kernel | 🔜 Planned | Contributions welcome |\n\nWorks with any LLM provider: OpenAI, Anthropic, Gemini, Groq, Mistral, or local models via Ollama. replayd does not call your LLM — it wraps your agent.\n\nAdding an integration? See [Contributing](#contributing).\n\n---\n\n## How replayd compares\n\n| | replayd | LangSmith | Braintrust | Langfuse |\n|---|---|---|---|---|\n| Turns failed runs into regression tests | ✅ | Partial | Partial | ❌ |\n| Replays known failures before deploy | ✅ | ❌ | ❌ | ❌ |\n| Active release gate | ✅ | ❌ | Partial | ❌ |\n| Zero runtime dependencies | ✅ | ❌ | ❌ | ❌ |\n| Open source core | ✅ | ❌ | ❌ | ✅ |\n| Framework agnostic | ✅ | ✅ | ✅ | ✅ |\n\nreplayd is not an alternative to observability tools. It works alongside them. LangSmith and Langfuse tell you what happened. replayd makes sure the worst things cannot happen again.\n\n---\n\n## Example agents\n\nFive production-grade example agents are included. Run any of them with no API key required — all grading is structural.\n\n| Agent | What it catches |\n|---|---|\n| `examples/basic_example.py` | Refund approval exceeding the policy limit |\n| `examples/multi_step_planning_agent.py` | Finalizing a plan without first calling `check_constraints` (budget, deadline, dependencies) |\n| `examples/rag_policy_agent.py` | Approving a refund based on a deprecated policy chunk it should have ignored |\n| `examples/incident_response_agent.py` | Running `rollback_deploy` without first paging a human via `escalate_to_human` |\n| `examples/langchain_tool_agent.py` | Issuing a full refund on a partial defect — LangChain tool-calling integration pattern |\n| `examples/openai_agents_sdk_example.py` | Approving a high-risk merge without running a security scan — OpenAI Agents SDK pattern |\n| `examples/real_openai_agent.py` | Real OpenAI call with auto-instrumentation — requires `OPENAI_API_KEY` |\n\nRun the no-API-key examples:\n\n```bash\npython examples/multi_step_planning_agent.py\npython examples/rag_policy_agent.py\npython examples/incident_response_agent.py\n```\n\nEach example shows FAIL on the buggy agent and PASS on the fixed agent.\n\n---\n\n## Recording tool calls\n\n### Auto-instrumentation (recommended)\n\nCall `rp.instrument_openai(client)` or `rp.instrument_anthropic(client)` once, before entering any capture block. Tool calls are then recorded automatically — no manual wrapping needed.\n\n```python\nfrom openai import OpenAI\nfrom replayd import Replayd\n\nrp = Replayd()\nclient = OpenAI()\nrp.instrument_openai(client)  # call once\n\nwith rp.capture(input=user_query, model=\"gpt-4o\") as run:\n    run.output = your_agent(client, user_query)  # tool calls recorded automatically\n```\n\nWorks for Anthropic too:\n\n```python\nimport anthropic\nclient = anthropic.Anthropic()\nrp.instrument_anthropic(client)\n```\n\nSee `examples/real_openai_agent.py` for a complete runnable example.\n\n### Manual recording (framework-agnostic fallback)\n\nIf your agent does not use OpenAI or Anthropic directly, wrap your tool dispatcher to record calls manually. **The agent you pass to `replay_all` must accept two arguments: `(input, run_ctx)`.**\n\n```python\ndef my_agent(input, run_ctx):\n    result = call_tool(\"search\", {\"query\": input[\"query\"]})\n    run_ctx.record_tool_call(\"search\", {\"query\": input[\"query\"]}, result)\n    # ... rest of agent logic\n    return final_output\n```\n\nPass this two-argument callable to `replay_all`:\n\n```python\nresults = rp.replay_all(agent=my_agent)\n```\n\n### Turning instrumentation off\n\n```python\nrp.uninstrument_openai(client)\nrp.uninstrument_anthropic(client)\n```\n\nBoth calls are idempotent. After them the client is exactly as it was before `instrument_*` was called. Useful in test teardown to avoid cross-test pollution.\n\n## Auto-instrumentation limitations\n\n**What is covered**\n\n| Client | Capture | Replay via `replay_all` |\n|---|---|---|\n| `OpenAI` (sync) | ✅ | ✅ |\n| `AsyncOpenAI` | ✅ | use sync wrapper¹ |\n| `Anthropic` (sync) | ✅ | ✅ |\n| `AsyncAnthropic` | ✅ | use sync wrapper¹ |\n| Streaming (`stream=True`) | ❌ warn + fallback | ❌ warn + fallback |\n\n¹ `replay_all` calls agents synchronously. Wrap async agents with `asyncio.run()` for replay:\n\n```python\nimport asyncio\n\ndef sync_wrapper(input, run_ctx):\n    return asyncio.run(my_async_agent(input, run_ctx))\n\nresults = rp.replay_all(agent=sync_wrapper)\n```\n\n**Streaming (`stream=True`) — not supported**\n\nWhen `stream=True` is passed inside an active capture block, the wrapper emits a `warnings.warn()` and passes through unchanged — tool calls are not recorded. Disable streaming for captured runs, or record manually:\n\n```python\nrun_ctx.record_tool_call(\"tool_name\", arguments, result)\n```\n\n**Final tool call with no follow-up model call**\n\nTool calls are recorded when the result arrives back as a `role: \"tool\"` message in the next API call. If your agent executes the last tool, uses the result in Python code, and never sends it back to the model, that call is not recorded. Use `record_tool_call()` for it explicitly.\n\nThe pattern that is fully covered without any manual work:\n\n```python\n# sync or async — both work\nwhile True:\n    response = client.chat.completions.create(messages=messages, tools=tools)\n    msg = response.choices[0].message\n    if msg.tool_calls:\n        for tc in msg.tool_calls:\n            result = execute_tool(tc.function.name, tc.function.arguments)\n            messages.append({\"role\": \"tool\", \"tool_call_id\": tc.id, \"content\": str(result)})\n    else:\n        break  # final answer\n```\n\n## Grading\n\nreplayd does **not** grade on exact output matching. LLMs are non-deterministic — the same correct behavior will produce different output text every run, so exact matching creates false failures. The wrong tool being called, however, is a fact. replayd grades on facts.\n\n| Failure type | Grading method |\n|---|---|\n| Wrong tool called, wrong argument, wrong state | Deterministic assertion — no LLM needed, never flaky |\n| Policy violated, wrong reasoning, bad decision | LLM-as-judge via `grader_prompt` |\n\nThe structural check always runs first. If a forbidden action fires, the test fails immediately without calling the LLM.\n\n### Semantic grading\n\nFor failures that can only be evaluated by reading the output:\n\n```python\nrp.save_test(\n    run.id,\n    grader_prompt=\"Did the agent approve a refund that exceeds the $500 policy limit?\",\n)\n```\n\nRequires:\n\n```bash\npip install \"replayd[semantic]\"\nexport ANTHROPIC_API_KEY=sk-...\n```\n\n---\n\n## Storage\n\nRuns and tests are stored as JSON files in `.replayd/` in your working directory:\n\n```\n.replayd/\n  runs/\u003crun-id\u003e.json    ← full record of each captured run\n  tests/\u003ctest-id\u003e.json  ← saved regression tests\n```\n\nNo database. No hosted backend. Commit `.replayd/tests/` into version control to share regression tests with your team — this gives your team a traceable record of every known agent failure and its expected behavior, an audit trail that grows with every failure your agents encounter in production. Keep `.replayd/runs/` out of git — it is local capture data.\n\n---\n\n## CI integration\n\nA ready-to-use script is included at `scripts/regression_check.py`. Copy it into your repo, replace the agent import, and add this to your workflow:\n\n```yaml\n# .github/workflows/regression.yml\n- name: Run regression tests\n  run: python scripts/regression_check.py\n```\n\nAny saved regression test that fails exits with code 1, blocking the deploy. This makes replayd a policy enforcement gate in your CI pipeline — any agent behavior that violates a known expectation blocks the release before it reaches users.\n\n---\n\n## Design principles\n\nThese four principles drive every decision in replayd.\n\n**1. Grade behavior, not output.**\nLLM output is non-deterministic by design. Grading the exact text a model returned is fragile and creates noise. replayd grades what the agent *did* — which tools it called, in what order, with what arguments. That is deterministic. That is what matters.\n\n**2. Capture from real failures, not from specs.**\nMost evaluation tools ask you to write tests from a specification. replayd captures from actual failures. A test that comes from a real production failure is worth ten that were written hypothetically. Real failures encode the exact context, input, and state that caused the problem.\n\n**3. Zero dependencies on the critical path.**\nThe core capture-and-replay loop requires no external services, no hosted backend, no LLM call. You can build up a full regression suite entirely offline. The LLM-as-judge grader is opt-in and only runs when deterministic grading is insufficient.\n\n**4. One correct action beats a correct output.**\nA good agent escalates when it should escalate. It calls the right tool in the right order. The exact phrasing of its explanation is secondary. replayd enforces what the agent *did*, not how it phrased the result.\n\n---\n\n## Roadmap\n\n| Status | Item |\n|---|---|\n| ✅ Done | Core capture → grade → gate loop |\n| ✅ Done | Deterministic (structural) grading |\n| ✅ Done | LLM-as-judge semantic grading |\n| ✅ Done | LangChain integration example |\n| ✅ Done | OpenAI Agents SDK integration example |\n| 🔜 Next | Test grouping and tagging |\n| 🔜 Next | `replayd run` CLI — replay without writing a Python script |\n| 🔜 Next | CrewAI and AutoGen integration examples |\n| 🔜 Next | HTML test report output |\n| 🔜 Planned | Parallel replay execution |\n| 🔜 Planned | Replay diffing — compare two agent versions side by side |\n| 🔜 Planned | Test flakiness detection |\n| 🔜 Planned | LlamaIndex and DSPy native integrations |\n\nWant to help ship any of these? See [Contributing](#contributing).\n\n---\n\n## FAQ\n\n**Does replayd require an API key to use?**\nNo. The core capture-and-replay loop with structural (deterministic) grading runs with zero external dependencies. An API key is only needed if you use `grader_prompt` for semantic grading via LLM-as-judge (`pip install \"replayd[semantic]\"`).\n\n**Does it work with any LLM provider?**\nYes. replayd wraps your agent as a callable and never interacts with your LLM directly. OpenAI, Anthropic, Gemini, Groq, Mistral, or a local model via Ollama — the provider does not matter.\n\n**Does it work with any agent framework?**\nYes, if the framework can be wrapped as a two-argument Python callable `(input, run_ctx) -\u003e output`. LangChain and OpenAI Agents SDK examples are included. CrewAI and AutoGen patterns are planned.\n\n**Do my tests break if the model gives a different (but correct) output?**\nNo. Structural tests check tool calls, not output text. They will not produce false positives because the model rephrased a correct answer differently. Semantic grading also evaluates meaning, not exact text.\n\n**Should I commit `.replayd/` to git?**\nCommit `.replayd/tests/` — this is your regression suite and should be shared with your team. Do not commit `.replayd/runs/` — these are local capture files and should stay out of version control.\n\n**How is this different from prompt testing tools like PromptFoo?**\nPromptFoo and similar tools help you evaluate prompt quality on hypothetical test cases you write upfront. replayd captures *real production failures* and turns them into regression tests. The workflow is capture-first, not specification-first. The tests come from reality, not from what you imagined could go wrong.\n\n**Can I run replayd in CI without any secrets?**\nYes. As long as you use only structural grading, replayd runs with no secrets at all. If you use `grader_prompt`, you need `ANTHROPIC_API_KEY` set in your CI environment.\n\n---\n\n## What replayd is not\n\nreplayd is not an observability tool. LangSmith, Braintrust, and Arize tell you what happened after the fact. replayd is an **active release gate** — it replays known failures before you ship. Passive vs active. That is the distinction.\n\n---\n\n## What builders say\n\n\u003e \"If something solved this it would definitely be worth paying for.\" — r/ycombinator\n\n\u003e \"Replaying old failures against new prompts and models should be standard at this point. Otherwise the same bugs just keep coming back quietly.\" — r/LLMDevs\n\n\u003e \"The capture step has too much friction. There's your next action item.\" — r/LLMDevs\n\n---\n\n## Star goals\n\n[![GitHub Stars](https://img.shields.io/github/stars/TaimoorKhan10/replayd?style=social)](https://github.com/TaimoorKhan10/replayd/stargazers)\n\n| Milestone | Stars |\n|---|---|\n| 🌱 Seedling | 50 |\n| 🌿 Growing | 100 |\n| 🚀 Momentum | 250 |\n| 💫 Community | 500 |\n| 🏆 Established | 1,000 |\n\nEvery star helps more builders find replayd. If it has saved you from a regression, star it.\n\n---\n\n## Part of TAQ by Stonepath Labs\n\nreplayd is the open source core of [TAQ](https://stonepathlab.net) — the full AI release control platform.\n\nTAQ adds: a dashboard, hosted backend, team access controls, release gate enforcement, and audit logs. replayd gets your team started with the concept. TAQ is what you run it on in production.\n\n**[stonepathlab.net](https://stonepathlab.net)**\n\n---\n\n## Contributing\n\nBug reports and pull requests are welcome. Open an issue on GitHub to discuss anything before sending a large PR.\n\nThe build has no dependencies — `pip install -e \".[dev]\"` gives you everything needed to run tests:\n\n```bash\npip install -e \".[dev]\"\npytest\n```\n\n**Good first contributions:**\n- Add a CrewAI integration example\n- Add an AutoGen integration example\n- Add a LlamaIndex integration example\n- Add regression scenarios for a real agent type\n- Improve the getting started documentation\n- Build the `replayd run` CLI command\n\n---\n\n## Star history\n\n[![Star History Chart](https://api.star-history.com/svg?repos=TaimoorKhan10/replayd\u0026type=Date)](https://star-history.com/#TaimoorKhan10/replayd\u0026Date)\n\n---\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaimoorkhan10%2Freplayd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaimoorkhan10%2Freplayd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaimoorkhan10%2Freplayd/lists"}