https://github.com/statsclaw/statsclaw
Paper in, package out. An agent teams framework that turns statistical papers into production-ready packages.
https://github.com/statsclaw/statsclaw
agent-teams causal-inference claude-code econometrics monte-carlo paper-to-package python-package r-package stata statistics
Last synced: 2 months ago
JSON representation
Paper in, package out. An agent teams framework that turns statistical papers into production-ready packages.
- Host: GitHub
- URL: https://github.com/statsclaw/statsclaw
- Owner: statsclaw
- Created: 2026-03-27T11:06:12.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-03-29T21:21:10.000Z (2 months ago)
- Last Synced: 2026-03-30T00:24:21.330Z (2 months ago)
- Topics: agent-teams, causal-inference, claude-code, econometrics, monte-carlo, paper-to-package, python-package, r-package, stata, statistics
- Homepage: https://statsclaw.ai
- Size: 477 KB
- Stars: 1
- Watchers: 0
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Roadmap: ROADMAP.md
Awesome Lists containing this project
README
# StatsClaw
**A workflow framework for statistical package development.**
**An open-source tool that helps researchers build, test, and document statistical software packages with AI agent teams.**
[Website](https://statsclaw.ai) · [Roadmap](ROADMAP.md) · [Contributing](CONTRIBUTING.md) · [Discussions](https://github.com/statsclaw/statsclaw/discussions)
---
## What is StatsClaw?
StatsClaw is a framework for [Claude Code](https://claude.ai/code) that uses **AI agent teams** to assist with statistical package development. You describe what you need — a bug fix, a new feature, a cross-language translation — and StatsClaw coordinates multiple AI agents to help you build, test, and document the result. It works best when a domain expert stays in the loop to guide decisions.
---
## How It Works
StatsClaw orchestrates a team of **9 specialized AI agents**, each operating under strict information isolation:
| Agent | Role |
|:------|:-----|
| **Leader** | Orchestrates the workflow, dispatches agents, enforces isolation |
| **Planner** | Reads your paper/formulas, executes deep comprehension protocol, produces specifications |
| **Builder** | Writes source code from `spec.md` (never sees the test spec) |
| **Tester** | Validates independently from `test-spec.md` (never sees the code spec) |
| **Simulator** | Runs Monte Carlo studies from `sim-spec.md` (never sees either spec) |
| **Scriber** | Documents architecture, generates tutorials, maintains audit trail |
| **Distiller** | Extracts reusable knowledge for the shared brain (brain mode only) |
| **Reviewer** | Cross-checks all pipelines, audits tolerance integrity, issues ship/no-ship verdict |
| **Shipper** | Commits, pushes, opens PRs, handles package distribution |
The **code**, **test**, and **simulation** pipelines are fully isolated — they never see each other's specs. If all pipelines converge independently, confidence in correctness is high. This is **adversarial verification by design**.
---
## Multi-Pipeline Architecture
```
planner (bridge)
/ | \
spec.md / test-spec.md \ sim-spec.md
/ | \
builder ─ ─(parallel)─ ─ simulator
(code pipeline) | (simulation pipeline)
\ | /
implementation.md | simulation.md
\ | /
\ v /
tester <-- sequential, after merge-back
(test pipeline)
|
audit.md
|
scriber (recording)
|
distiller (brain mode only)
|
reviewer (convergence)
|
shipper
```
**Key properties:**
- **Planner is always mandatory** — it bridges all pipelines
- **Builder handles code, scriber handles docs, simulator handles Monte Carlo studies** — for docs-only requests, scriber replaces builder as implementer
- **Builder and simulator run in parallel** (simulation workflows), then **tester validates the merged result** — each pipeline has its own isolated spec
- **Pipeline isolation is enforced** — each pipeline never sees another's spec
- **Adversarial verification** — if all pipelines converge independently, confidence is high
---
## Supported Languages
| R | Python | Stata | TypeScript | Go | Rust | C | C++ |
|:-:|:------:|:-----:|:----------:|:--:|:----:|:-:|:---:|
More languages coming — [Julia is next](https://github.com/statsclaw/statsclaw/issues/3)! Want another? [Let us know](https://github.com/statsclaw/statsclaw/issues/new?template=feature-request.yml).
---
## Quick Start
### Prerequisites
1. **Claude Code** — [Install Claude Code](https://claude.ai/code)
2. **GitHub access** — Push access to your target repository
3. **Workspace repo** — A GitHub repo for storing workflow artifacts (auto-created if needed)
### Your First Task
Just tell StatsClaw what you want. It auto-detects the language, selects the right workflow, and starts working:
```
work on https://github.com/your-org/your-package resolve the issues
```
StatsClaw will auto-detect the language, select a workflow, and start working. It will ask you clarification questions when it encounters ambiguity — your domain expertise guides the process. Results vary depending on task complexity; expect to iterate.
---
## Workflow
```text
Code: leader → planner → builder → tester → scriber → [distiller]? → reviewer → shipper?
Docs-only: leader → planner → scriber → reviewer → shipper?
Simulation+Code: leader → planner → [builder ∥ simulator] → tester → scriber → [distiller]? → reviewer → shipper?
Simulation-only: leader → planner → simulator → tester → scriber → [distiller]? → reviewer → shipper?
```
States: `CREDENTIALS_VERIFIED → NEW → PLANNED → SPEC_READY → PIPELINES_COMPLETE → DOCUMENTED → [KNOWLEDGE_EXTRACTED]? → REVIEW_PASSED → READY_TO_SHIP → DONE`
Signals: `HOLD` (ambiguous, ask user), `BLOCK` (validation failed), `STOP` (unsafe to ship)
---
## What Can StatsClaw Help With?
| Task | How it helps | Limitations |
|:-----|:-------------|:------------|
| **Implementing methods** | Assists with translating specs into code | Requires researcher to validate mathematical correctness |
| **Cross-language translation** | Handles R/Python idiom differences | May miss subtle numerical edge cases without careful review |
| **Testing & validation** | Independent test pipeline catches bugs tests miss | Empirical verification, not formal proofs |
| **Monte Carlo studies** | Automates simulation harness and reporting | Researcher must design meaningful DGPs and metrics |
| **Paper-driven features** | Reads methodology papers to design new functionality | Extracts concepts, not full estimator implementations |
| **Bug fixing** | Adversarial architecture helps find hidden bugs | Complex domain bugs still need human insight |
| **Documentation** | Generates Quarto books, API docs | Needs researcher review for accuracy |
---
## Example Prompts
```
# Fix a specific issue
fix issue #42 in my-package
# Build from scratch
build a Python package from this R code
# Cross-language migration
rewrite the Python backends in pure R and ship it
# Simulation study
run a Monte Carlo study comparing these three estimators
# Paper to package
build the R works from this PDF
# Paper-driven feature
read Correia (2016) and add network visualization to panelView
# Documentation
update the documentation for v2.0
# Contribute knowledge to the shared brain
/contribute
```
---
## Learn by Example
We provide examples from our own usage. Each is a real repository you can inspect and learn from. Your mileage may vary — these represent what worked for us with active researcher involvement.
| Example | Repo | What it demonstrates |
|:--------|:-----|:---------------------|
| Iterative refactoring (1 to 2) | [`statsclaw/example-fect`](https://github.com/statsclaw/example-fect) | Multi-day, researcher-guided refactoring of an R package |
| Python from R source (0 to 1) | [`statsclaw/example-R2PY`](https://github.com/statsclaw/example-R2PY) | Building a Python package from an R reference |
| Paper to package + Monte Carlo | [`statsclaw/example-probit`](https://github.com/statsclaw/example-probit) | PDF manuscript to R/C++ package + simulation |
| Paper-driven feature addition | [`statsclaw/example-panelView`](https://github.com/statsclaw/example-panelView) | Reading a methodology paper to design a new feature |
See the [workspace example](https://github.com/statsclaw/example-workspace) for the actual workflow artifacts produced during these examples.
---
## What You Install
- `CLAUDE.md` — orchestration policy (the authoritative reference)
- `agents/` — agent definitions (leader, planner, builder, tester, simulator, scriber, distiller, reviewer, shipper)
- `skills/` — shared protocol skills (credential-setup, isolation, handoff, mailbox, issue-patrol, profile-detection, brain-sync, privacy-scrub)
- `profiles/` — language-specific execution rules (R, Python, TypeScript, Stata, Go, Rust, C, C++)
- `templates/` — runtime artifact templates and repo scaffolding (brain-repo, brain-seedbank-repo)
Agent Teams is enabled at the project level through `.claude/settings.json`.
---
## Runtime Layout
All runtime state lives inside the workspace repo, organized per target repository:
```text
.repos/
├── / # target repo checkout
├── brain/ # statsclaw/brain clone (brain mode only)
├── brain-seedbank/ # statsclaw/brain-seedbank clone (brain mode only)
└── workspace/ # workspace repo (GitHub)
└── / # per-target-repo runtime + logs
├── context.md # active project context
├── CHANGELOG.md # timeline index of all runs (pushed)
├── HANDOFF.md # active handoff (pushed)
├── ref/ # reference docs for future work (pushed)
├── runs/
│ └── / # per-run artifacts
│ ├── credentials.md # push access verification
│ ├── request.md # scope and acceptance criteria
│ ├── status.md # state machine
│ ├── impact.md # affected files and risk areas
│ ├── comprehension.md # comprehension verification (from planner)
│ ├── spec.md # code pipeline input (from planner)
│ ├── test-spec.md # test pipeline input (from planner)
│ ├── sim-spec.md # simulation pipeline input (from planner, workflows 11/12)
│ ├── implementation.md # code pipeline output (from builder)
│ ├── simulation.md # simulation pipeline output (from simulator, workflows 11/12)
│ ├── audit.md # test pipeline output (from tester)
│ ├── ARCHITECTURE.md # from scriber (primary copy in target repo root)
│ ├── log-entry.md # process record (from scriber; promoted to runs/-.md)
│ ├── docs.md # documentation changes (from scriber)
│ ├── brain-contributions.md # knowledge entries (from distiller, brain mode only)
│ ├── review.md # convergence verdict (from reviewer)
│ ├── shipper.md # ship actions (from shipper)
│ ├── mailbox.md # inter-teammate communication
│ └── locks/ # write surface locks
├── logs/ # diagnostic logs
└── tmp/ # transient data
```
---
## Repository Layout
```text
StatsClaw/
├── CLAUDE.md # orchestration policy
├── README.md
├── agents/ # agent definitions (9 agents including distiller)
├── skills/ # shared protocol skills (13 skills including brain-sync, privacy-scrub)
├── profiles/ # language execution rules (8 languages)
├── templates/ # runtime artifact templates + repo scaffolding (brain-repo, brain-seedbank-repo)
└── .repos/ # target repo checkouts + workspace + brain repos (runtime state, git-ignored)
```
---
## Workspace Repository
Workflow logs, process records, and handoff documents are NOT stored in target repos. Instead, they are synced to a user-specified **workspace repository** on GitHub (e.g., `[username]/workspace`):
```text
workspace/
├── fect/
│ ├── CHANGELOG.md # timeline index
│ ├── HANDOFF.md # active handoff
│ ├── ref/ # reference docs for future work
│ │ └── cv-comparison-table.md
│ └── runs/ # individual workflow logs
│ ├── 2026-03-16-cv-unification.md
│ └── 2026-03-17-convergence-conditioning.md
├── panelview/
│ ├── CHANGELOG.md
│ ├── HANDOFF.md
│ ├── ref/
│ └── runs/
│ └── 2026-03-17-add-feature.md
└── README.md
```
This keeps target repos clean (code + essential docs only) while preserving full traceability in one place.
---
## Shared Brain
StatsClaw has a shared knowledge system where techniques discovered during workflows — mathematical methods, coding patterns, validation strategies, simulation designs — are extracted, privacy-scrubbed, and contributed to a collective knowledge base. When you enable Brain mode, your agents get smarter by reading knowledge contributed by all users.
**How it works:**
1. **Read** — Your agents automatically access relevant knowledge entries from [`statsclaw/brain`](https://github.com/statsclaw/brain)
2. **Contribute** — After noteworthy workflows, the distiller agent extracts reusable knowledge. You review everything and approve or decline — nothing is shared without your explicit consent. You can also run the built-in `/contribute` command at any time to summarize what you learned — what worked, what required manual intervention, and what domain-specific patterns emerged — and submit it as a structured report
3. **Earn badges** — Accepted contributions earn virtual badges on the [Contributors leaderboard](https://github.com/statsclaw/brain/blob/main/CONTRIBUTORS.md)
**Privacy guarantee:** All contributions are automatically scrubbed of repo names, file paths, usernames, proprietary code, and any identifying information. Only generic, reusable knowledge is shared.
| Repo | Purpose |
|:-----|:--------|
| [`statsclaw/brain`](https://github.com/statsclaw/brain) | Curated knowledge — agents read from here |
| [`statsclaw/brain-seedbank`](https://github.com/statsclaw/brain-seedbank) | Contribution staging — users submit PRs here |
Brain mode is optional — you choose at session start. See [Brain System Documentation](.github/BRAIN.md) for full details.
---
## Design Principles
- **Credentials first, work second.** Verify push access before creating a run.
- **Team Leader dispatches, never does.** Leader plans and coordinates; teammates do the work.
- **Multi-pipeline, fully isolated.** Code, test, and simulation pipelines never see each other's specs.
- **Planner first, always.** Every non-trivial request starts with dual-spec production.
- **Adversarial verification by design.** Independent convergence proves correctness.
- **Hard gates, not soft advice.** State transitions have preconditions; artifacts are verified.
- **Worktree isolation for writers.** Builder, simulator, and scriber run in isolated git worktrees.
- **Surgical scope.** Each run modifies only what the request requires.
- **Explicit ship actions.** Nothing is pushed without user instruction or active patrol skill.
- **Collective knowledge, individual consent.** Brain mode lets agents learn from all users, but nothing is shared without explicit per-workflow approval.
---
## Citation
If you use StatsClaw in your research or software development, please cite our paper:
> Qin, Tianzhu and Yiqing Xu. 2026. "[StatsClaw: An AI-Collaborative Workflow for Statistical Software Development](https://bit.ly/statsclaw)."
BibTeX:
```bibtex
@misc{qinxu2026statsclaw,
title={StatsClaw: An AI-Collaborative Workflow for Statistical Software Development},
author={Qin, Tianzhu and Xu, Yiqing},
year={2026},
howpublished = {Mimeo, Stanford University},
url={https://bit.ly/statsclaw}
}
```
---
## License
StatsClaw is released under the [MIT License](LICENSE).
---
## Get Involved
We are building StatsClaw in the open. Everyone is welcome.
- **Share an idea** — [Discussions](https://github.com/statsclaw/statsclaw/discussions/categories/ideas)
- **Report a bug** — [Bug report](https://github.com/statsclaw/statsclaw/issues/new?template=bug-report.yml)
- **Contribute code** — [Contributing guide](CONTRIBUTING.md)
- **Contribute knowledge** — Enable Brain mode and your discoveries help everyone. [Learn more](.github/BRAIN.md)
- **See what is planned** — [Roadmap](ROADMAP.md)
---
**[statsclaw.ai](https://statsclaw.ai)**
*A tool for statisticians and econometricians. Works best with an expert in the loop.*