{"id":50391163,"url":"https://github.com/kadubon/oasg","last_synced_at":"2026-05-30T18:01:48.895Z","repository":{"id":356892093,"uuid":"1234392168","full_name":"kadubon/oasg","owner":"kadubon","description":"Local-first, model-agnostic workflow optimizer for long-running AI agents: observable JSONL ledgers, deterministic reducers, no-meta gates, and receipt-backed self-improvement without LLM judges or model-weight updates.","archived":false,"fork":false,"pushed_at":"2026-05-10T09:23:51.000Z","size":355,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-10T10:35:36.771Z","etag":null,"topics":["agent-evaluation","agent-memory","agent-workflows","ai-agents","autonomous-agents","deterministic-replay","jsonl","long-running-agents","model-agnostic","no-meta","ollama","python","self-improving-agents","verification","workflow-automation","workflow-optimization"],"latest_commit_sha":null,"homepage":"https://kadubon.github.io/github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kadubon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-10T05:53:30.000Z","updated_at":"2026-05-10T09:23:55.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/kadubon/oasg","commit_stats":null,"previous_names":["kadubon/oasg"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/kadubon/oasg","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadubon%2Foasg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadubon%2Foasg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadubon%2Foasg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadubon%2Foasg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kadubon","download_url":"https://codeload.github.com/kadubon/oasg/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadubon%2Foasg/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33703065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-evaluation","agent-memory","agent-workflows","ai-agents","autonomous-agents","deterministic-replay","jsonl","long-running-agents","model-agnostic","no-meta","ollama","python","self-improving-agents","verification","workflow-automation","workflow-optimization"],"created_at":"2026-05-30T18:01:47.768Z","updated_at":"2026-05-30T18:01:48.878Z","avatar_url":"https://github.com/kadubon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OASG\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20107660.svg)](https://doi.org/10.5281/zenodo.20107660)\n[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-blue.svg)](LICENSE)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](pyproject.toml)\n\nObservable-only Autonomic Slack Gradient for local-first AI agent workflow optimization.\n\nOASG is a local-first toolkit for long-running AI-agent workflows. It records what the\nagent can observe, reduces that history into operational state, proposes bounded workflow-policy\nchanges, tests them through receipts, and promotes only changes that improve conservative\noperational viability without protected regression.\n\nThe project target is not a smarter model. The target is a more durable workflow:\n\n\u003e keep running, learn from observable history, and improve operational capability without using an\n\u003e external evaluator as the improvement oracle.\n\nOASG optimizes workflow policy only. It does not fine-tune model weights, does not use an LLM judge,\nand does not claim semantic truth. Deterministic validators, replay receipts, rollback receipts,\nresource counters, and ledger checks are ordinary observable channels.\n\n## What You Can Do With It\n\n- Wrap any local or remote model as an observation source without making that model trusted.\n- Store agent activity as append-only JSONL ledgers with canonical hashes and prefix checks.\n- Reduce long-running workflow history into operational debt, pressure, and viability receipts.\n- Trial workflow-policy changes through shadow/lease ledgers before promotion.\n- Run conservative local optimization loops that can reject, quarantine, roll back, or promote.\n- Export JSON Schemas and conformance fixtures for ports in other languages.\n\nUse OASG when you need durable workflow operation, auditability, and fail-closed self-improvement\naround an agent. Do not use it as a benchmark score, model trainer, LLM judge, sandbox, or semantic\ntruth oracle.\n\nIf you have five minutes, start with\n[`docs/quick_mental_model.md`](docs/quick_mental_model.md), then run the\n[`examples/minimal_agent_integration`](examples/minimal_agent_integration) example.\n\n## Contents\n\n- [Quick Mental Model](#quick-mental-model)\n- [Why This Is Different](#why-this-is-different)\n- [Current Status](#current-status)\n- [Quickstart](#quickstart)\n- [Use OASG With Your Agent](#use-oasg-with-your-agent)\n- [CLI Map](#cli-map)\n- [Model Integration](#model-integration)\n- [Rejection Guide](#rejection-guide)\n- [Experiments and Evidence](#experiments-and-evidence)\n- [Development Checks](#development-checks)\n- [Citation](#citation)\n- [Project Layout](#project-layout)\n- [License](#license)\n\n## Quick Mental Model\n\nOASG is like Git + unit tests + CI gate + rollback receipts for an AI agent workflow. Your agent\nkeeps running in its normal framework. OASG records observable events, checks workflow debt and\nviability, trials policy changes, and promotes only changes with receipt-backed evidence.\n\nRead the five-minute explanation:\n[`docs/quick_mental_model.md`](docs/quick_mental_model.md).\n\n## Why This Is Different\n\n\u003e OASG turns long-running AI agents into self-maintaining workflow systems that improve only from observable operational evidence, without LLM judges, external rewards, or model-weight updates.\n\nMost agent-improvement systems optimize for answer quality, benchmark scores, human feedback,\nLLM-judge feedback, or externally supplied reward functions. OASG instead treats a long-running\nagent as an operational system whose future action capacity can expand or collapse.\n\nThe core object is not accuracy. It is a conservative partial-order vector over:\n\n- viable future action classes;\n- unresolved obligations;\n- validation, parse, replay, rollback, and evidence debt;\n- budget, queue, context, and maintenance pressure;\n- protected semantic, taint, boundary, authority, and effect floors;\n- shadow, lease, gate, promotion, quarantine, and rollback receipts.\n\nThe improvement loop is:\n\n```text\nappend-only JSONL observable history\n  -\u003e canonical hashing and ledger-prefix verification\n  -\u003e deterministic reducers\n  -\u003e finite-chain slack/debt state\n  -\u003e typed pressure vector and scheduler\n  -\u003e bounded workflow-policy mutation batch\n  -\u003e runner-produced shadow/lease trial ledgers\n  -\u003e finite-horizon KLB_2 viability lower bound\n  -\u003e sidecar positive evidence witnesses\n  -\u003e no-meta dominance gate\n  -\u003e safe_non_regression / safe_promotion / active_promoted / reject / quarantine\n```\n\nThe Python package is the reference runtime. The portability contract is language-independent:\ncanonical JSON bytes, SHA-256 domain hashes, JSONL ledgers, JSON receipts, JSON Schemas, and\nconformance fixtures.\n\n## Current Status\n\nPackage version: `1.1.0`.\n\nThis repository is a working reference implementation with a conservative trusted core and an\nexperimental long-running validation suite. It is suitable for local experiments and controlled\nworkflow-policy optimization. It should still be treated as an alpha system for production\nautomation because the safe path intentionally rejects many cases until they have complete receipts.\n\nImplemented:\n\n- OASG-CJ-1 canonical JSON and SHA-256 domain hashing.\n- Append-only JSONL ledger sealing, duplicate handling, prefix verification, and quarantine\n  receipts.\n- Deterministic reducers over finite-chain dimensions and protected debt.\n- Bounded `KLB_2` computation over 8 action classes and 73 trace classes.\n- Typed pressure vectors and persistent scheduler state.\n- Mutator profiles, outcome memory, cooldown, and bounded workflow-policy mutation batches.\n- Structured workflow policy state and mutation patches.\n- Runner-backed shadow and lease receipt paths.\n- `ledger-replay`, explicit shell-free `local-command`, and demo-only `demo-replay` runner modes.\n- Positive evidence witnesses bound to ledger prefixes, comparison contracts, workload manifests,\n  KLB receipts, and trial receipts.\n- No-meta dominance gate with `safe_non_regression`, `safe_promotion`, and conservative rejection.\n- `optimize run`, resumable `optimize watch`, and lock-aware `optimize supervise`.\n- Workflow library state with active policy, active mutations, rollback snapshots, quarantine,\n  retirement, outcome memory, and conflict receipts.\n- Model-agnostic adapters that emit observation events rather than evaluator judgments.\n- JSON Schema export and conformance fixtures.\n- Ollama `gemma4:e4b` experiment profiles with null, inconclusive, positive, interrupted,\n  strong-baseline negative, and nonstationary confirmatory mixed-reversion results retained.\n\nNot implemented or not claimed:\n\n- No model-weight training or fine-tuning.\n- No semantic truth proof.\n- No sandbox guarantee.\n- No unconstrained network, financial, communication, secret-touching, or irreversible effects by\n  default.\n- No active promotion from synthetic/demo evidence.\n- No claim that OASG universally improves all agents or all task distributions.\n- No claim that `gemma4:e4b` became more intelligent.\n\n## Quickstart\n\nRequirements:\n\n- Python `\u003e=3.12`\n- [`uv`](https://docs.astral.sh/uv/)\n\nFrom a fresh checkout:\n\n```bash\nuv sync\nuv run oasg demo quickstart\nuv run oasg doctor\nuv run oasg conformance run examples/conformance\n```\n\nInspect the generated quickstart artifacts:\n\n```bash\nuv run oasg ledger verify examples/quickstart/baseline.jsonl\nuv run oasg reduce examples/quickstart/candidate.jsonl --out examples/quickstart/reducer_snapshot.json\nuv run oasg klb examples/quickstart/reducer_snapshot.json --out examples/quickstart/klb_receipt.json\nuv run oasg gate --baseline examples/quickstart/baseline.jsonl --candidate examples/quickstart/candidate.jsonl --contract examples/quickstart/comparison_contract.json --workload examples/quickstart/workload_manifest.json --witnesses examples/quickstart/positive_evidence_witnesses.json\n```\n\nDefault runtime behavior is local-only and network-free.\n\n### Choose a Path\n\n| goal | start here |\n| --- | --- |\n| understand the concept in 5 minutes | [`docs/quick_mental_model.md`](docs/quick_mental_model.md) |\n| inspect the core receipts | `uv run oasg demo quickstart` |\n| see the shortest agent insertion point | [`examples/minimal_agent_integration`](examples/minimal_agent_integration) |\n| verify a ledger from another implementation | `uv run oasg ledger verify history.jsonl` |\n| wrap an existing agent | [Use OASG With Your Agent](#use-oasg-with-your-agent) |\n| run a local optimization cycle | `uv run oasg optimize run --history history.jsonl --library workflow_library.json --out-dir .oasg/run` |\n| run repeated local supervision | `uv run oasg optimize supervise --history history.jsonl --library workflow_library.json --state optimizer_state.json --out-dir .oasg/supervise` |\n| reproduce the current evidence | [Experiments and Evidence](#experiments-and-evidence) |\n\n## Use OASG With Your Agent\n\nOASG does not require a specific model provider. Your agent, model wrapper, tool runner, or workflow\nengine only needs to emit observable events into an OASG JSONL ledger.\n\n### 1. Record Observable Events\n\nFor a quick local ledger:\n\n```bash\nuv run oasg observe --out history.jsonl --workflow-id my_agent --component-id planner --dimension budget=acceptable --action pure_read=acceptable --assume-complete\n```\n\n`--assume-complete` is a demo shortcut. In real workflows, emit the relevant dimensions, action\nclasses, resources, retry counts, validation results, rollback/evidence receipts, and unresolved\nobligations explicitly. Missing data fails closed.\n\n### 2. Inspect Operational Pressure\n\n```bash\nuv run oasg pressure history.jsonl --out pressure_vector.json\nuv run oasg scheduler history.jsonl --out scheduler_state.json\n```\n\nPressure is diagnostic and typed. It is not a scalar reward and cannot by itself promote a mutation.\n\n### 3. Scaffold a Local Trial Harness\n\n```bash\nuv run oasg harness init --out oasg_harness.py\n```\n\nReplace the template body with your actual local workflow trial. The command must be deterministic\nenough for your use case and must emit a sealed OASG JSONL trial ledger. Promotion evidence must\ncome from runner-produced trial ledgers, not from mutation metadata or model text.\n\n### 4. Run a Conservative Optimization Cycle\n\n```bash\nuv run oasg optimize run --history history.jsonl --library workflow_library.json --out-dir .oasg/run --cycles 1 --runner local-command --runner-arg python --runner-arg oasg_harness.py --runner-arg --mutation --runner-arg \"{mutation}\" --runner-arg --candidate --runner-arg \"{candidate}\"\nuv run oasg library status --library workflow_library.json\n```\n\nThe optimizer performs reduce, KLB, pressure, scheduling, mutation proposal, runner-backed\nshadow/lease trial derivation, comparison over observed trial ledgers, witness creation, gate\nevaluation, and workflow-library update. If receipts are incomplete, the result is rejected or\ninconclusive.\n\n### 5. Run as a Long-Running Local Supervisor\n\n```bash\nuv run oasg optimize supervise --history history.jsonl --library workflow_library.json --state optimizer_state.json --out-dir .oasg/supervise --max-iterations 1 --runner local-command --runner-arg python --runner-arg oasg_harness.py --runner-arg --mutation --runner-arg \"{mutation}\" --runner-arg --candidate --runner-arg \"{candidate}\" --append-lease-observations\nuv run oasg optimize state --state optimizer_state.json\nuv run oasg library history --library workflow_library.json\n```\n\nThe supervisor tracks consumed ledger prefixes, pending trials, scheduler state, mutation outcome\nmemory, library hashes, and append receipts. If history shrinks, forks, or disagrees with the saved\nprefix, it emits a stale/fork receipt and does not promote.\n\n## CLI Map\n\n```bash\nuv run oasg init\nuv run oasg doctor\nuv run oasg schema export --out schemas\nuv run oasg schema policy --out policy_profile.json\n\nuv run oasg ledger verify history.jsonl\nuv run oasg ledger append --ledger history.jsonl --records new_events.jsonl --out history.jsonl\nuv run oasg reduce history.jsonl --out reducer_snapshot.json\nuv run oasg klb reducer_snapshot.json --out klb_receipt.json\nuv run oasg pressure history.jsonl --out pressure_vector.json\nuv run oasg scheduler history.jsonl --out scheduler_state.json\n\nuv run oasg compare --baseline baseline.jsonl --candidate candidate.jsonl --out-dir comparison\nuv run oasg witness --coordinate KLB_2.pure_read --candidate-snapshot comparison/candidate_snapshot.json --candidate-klb comparison/candidate_klb_receipt.json --contract comparison/comparison_contract.json --workload comparison/workload_manifest.json --out comparison/positive_evidence_witnesses.json\nuv run oasg gate --baseline baseline.jsonl --candidate candidate.jsonl --contract comparison/comparison_contract.json --workload comparison/workload_manifest.json --witnesses comparison/positive_evidence_witnesses.json\n\nuv run oasg mutate plan --out-dir mutation --mutation-id mut_001 --coordinate KLB_2.pure_read --action-id pure_read\nuv run oasg mutator profile init --out mutators.json\n\nuv run oasg workload manifest --baseline baseline.jsonl --candidate candidate.jsonl --out-dir comparison\nuv run oasg workload run --mutation mutation/mutation_record.json --candidate candidate.jsonl --workload comparison/workload_manifest.json --out-dir .oasg/workload --runner ledger-replay --trial-ledger-out observed_trial.jsonl\nuv run oasg trial run --phase shadow --mutation mutation/mutation_record.json --candidate candidate.jsonl --workload comparison/workload_manifest.json --out-dir .oasg/trial --runner ledger-replay --trial-ledger observed_trial.jsonl\n\nuv run oasg optimize plan --history history.jsonl --library workflow_library.json --out-dir .oasg/plan\nuv run oasg optimize run --history history.jsonl --library workflow_library.json --out-dir .oasg/run --cycles 1\nuv run oasg optimize watch --history history.jsonl --library workflow_library.json --state optimizer_state.json --out-dir .oasg/watch --max-iterations 1\nuv run oasg optimize supervise --history history.jsonl --library workflow_library.json --state optimizer_state.json --out-dir .oasg/supervise --max-iterations 1\n\nuv run oasg experiment verify-longrun --run-dir experiment/ollama_gemma4_e4b_longrun/runs/latest --out experiment/ollama_gemma4_e4b_longrun/results\nuv run oasg experiment diagnose-promotion --run-dir experiment/ollama_gemma4_e4b_longrun/runs/latest --out experiment/ollama_gemma4_e4b_longrun/results\nuv run oasg conformance run examples/conformance\n```\n\nOperational commands emit deterministic JSON receipts where possible.\n\n## Model Integration\n\nAdapters are convenience wrappers. They are outside the trusted gate and cannot create positive\npromotion evidence by themselves.\n\nIncluded examples:\n\n- `oasg.adapters.invoke_command`: local subprocess observation wrapper.\n- `oasg.adapters.invoke_function`: Python callable observation wrapper.\n- `oasg.adapters.openai_compatible.invoke_openai_compatible`: optional OpenAI-compatible HTTP\n  request wrapper.\n\nThe safe pattern is:\n\n1. call your model or tool;\n2. convert the result into a `ModelEvent`;\n3. seal it into an OASG event record;\n4. append the record to the observable ledger;\n5. let reducers, gates, and trial receipts decide whether workflow policy can change.\n\nLocal Ollama experiments in this repository use only localhost Ollama as the model endpoint.\n\n### Works With Existing Orchestrators\n\nOASG is not a replacement for an agent framework. It can sit beside one:\n\n- plain Python: wrap a function or model call and append an OASG event;\n- LangGraph: LangGraph handles durable execution and resume, OASG handles promotion gates;\n- CrewAI: CrewAI handles crew/task execution, OASG observes outcomes and gates policy changes;\n- any provider: emit JSONL observations and keep provider output outside the trusted gate.\n\nSee [`examples/framework_adapters`](examples/framework_adapters) for dependency-free adapter\npatterns. LangGraph and CrewAI are optional examples, not package dependencies.\n\n## Rejection Guide\n\nCommon statuses:\n\n- `rejected_no_concrete_positive_evidence`: an improved coordinate lacks a valid sidecar witness.\n- `rejected_floor_violation`: a protected floor regressed.\n- `rejected_contaminated_comparison`: baseline/candidate workload pairing is not equivalent.\n- `rejected_effect_policy`: the mutation requests a disallowed effect or promotion class.\n- `rejected_semantic_floor_missing`: a claim-emitting action lacks a semantic-floor policy.\n- `rejected_secret_taint`: secret or unknown-secret taint reached a protected action.\n- `inconclusive_klb_overflow`: bounded `KLB_2` enumeration exceeded the profile cap.\n- `no_valid_candidate`: optimizer found no candidate with complete gate, shadow, lease, and witness\n  receipts.\n- `no_new_work`: watch/supervise saw the same append index and ledger prefix as the prior\n  checkpoint.\n- `stale_optimizer_state`: saved optimizer state and current ledger prefix/append index disagree.\n- `library_conflict`: workflow library changed between load and atomic write.\n\nRejection is not a runtime error in OASG. It is often the correct fail-closed result.\n\n## Experiments and Evidence\n\nThe repository includes local Ollama `gemma4:e4b` experiments. They are designed to test workflow\noperation, not model intelligence. All reported runs used deterministic operational validators and\nkept failed, rejected, and inconclusive receipts.\n\nCurrent evidence bottom line:\n\n- OASG showed a practical workflow-operation improvement over a deliberately weak fixed baseline in\n  the decisive experiment.\n- OASG did not show an incremental improvement over a calibration-selected strong static baseline\n  on a fixed held-out distribution in the strong-baseline v2 experiment.\n- OASG did show time-boxed post-drift recovery over a calibration-selected strong static workflow\n  in the nonstationary strong-baseline protocol.\n- The larger four-variant nonstationary confirmatory run found a real primary post-drift\n  improvement, but its final classification is `phase_specific_nonstationary_support`, not broad\n  `oasg_nonstationary_confirmed` support.\n- Therefore, the scientifically honest claim is conditional: this implementation has positive\n  evidence for workflow adaptation when operational requirements drift, strongest under mixed\n  reversion / policy-retirement-sensitive drift and also present under mild drift, while\n  structural-only support remains below threshold and fixed-distribution strong-baseline evidence\n  remains negative.\n\n### Evidence Summary\n\n| experiment | classification | key result | interpretation |\n| --- | --- | --- | --- |\n| `experiment/ollama_gemma4_e4b_pilot` | `no_clear_effect` | 12 tasks; baseline and adaptive both closed 8/12; active promotions 0 | Initial pilot did not establish adaptation. |\n| `experiment/ollama_gemma4_e4b_pilot` effect profile | `no_clear_effect` | 48 held-out eval tasks; baseline and adaptive both closed 26/48; active promotions 0 | Workflow-sensitive design still did not activate promotion. |\n| `experiment/ollama_gemma4_e4b_longrun` | `inconclusive_no_active_policy` | baseline 276/408 closed; observe-only 277/408 closed; adaptive evaluation was not run because active promotions 0 | Long-run measurement correctly refused to claim OASG effect. |\n| `experiment/ollama_gemma4_e4b_definitive` | `workload_not_sensitive` | mechanism qualification blocked Stage B; no effect claim | The positive-control policy did not establish a useful measurement workload. |\n| `experiment/ollama_gemma4_e4b_decisive` | `oasg_effect_confirmed` | 5 seeds, 680 paired held-out tasks; adaptive debt AUC 2040 -\u003e 921; closure 0 -\u003e 337; hard-floor regressions 0 | Under this preregistered weak-baseline workload, OASG adaptive produced a practical workflow-operation improvement. |\n| `experiment/ollama_gemma4_e4b_strong_baseline` | `promotion_mechanism_failure_vs_strong_baseline` | strong baseline qualified; adaptive readiness active seeds 0/4 required; run interrupted after 7/25 held-out condition blocks | No incremental OASG effect over the strong baseline is claimed. The run was stopped because adaptive activation failed before evaluation, making the primary effect question non-identifiable. |\n| `experiment/ollama_gemma4_e4b_strong_baseline_v2` | `no_incremental_effect_vs_strong_baseline` | 5 seeds, 680 paired held-out tasks; strong static debt AUC 434; OASG adaptive debt AUC 436; debt delta `+2`, CI `[0, 5]`; cost delta `+7652`, CI `[1534, 14346]`; hard-floor regressions 0 | Readiness succeeded, but held-out evaluation did not show incremental OASG value over the calibrated strong static workflow. |\n| `experiment/ollama_gemma4_e4b_nonstationary_strong_baseline` | `oasg_nonstationary_effect_confirmed_timeboxed` | 2 seeds, 48 paired post-drift tasks; strong static debt AUC `112`; OASG adaptive debt AUC `84`; debt delta `-28`, CI `[-51, -10]`; closure `20/48 -\u003e 27/48`; hard-floor regressions `0` | Time-boxed positive evidence that fail-closed OASG adaptation recovered post-drift operational debt over a calibration-selected strong static workflow. The claim is limited to this frozen protocol and is not universal. |\n| `experiment/ollama_gemma4_e4b_nonstationary_confirmatory` | `phase_specific_nonstationary_support` | 4 variants, 5 seeds, 600 paired post-drift tasks; strong static debt AUC `1524`; OASG adaptive debt AUC `1352`; debt delta `-172`, CI `[-220, -121]`; closure `259/600 -\u003e 300/600`; cost delta `-87081`, CI `[-104221, -70419]`; hard-floor regressions `0` | Primary and control comparisons favor OASG. Mixed-reversion / policy-retirement-sensitive drift is strongly supported, mild drift also supports improvement, and structural-only improvement is below the preregistered support threshold. This is phase-specific support, not broad nonstationary confirmation. |\n\n### Decisive Run Details\n\nThe decisive run is the strongest weak-baseline positive evidence in this repository.\n\nArtifacts:\n\n- results report: [`experiment/ollama_gemma4_e4b_decisive/results/report.md`](experiment/ollama_gemma4_e4b_decisive/results/report.md)\n- metrics: [`experiment/ollama_gemma4_e4b_decisive/results/metrics.json`](experiment/ollama_gemma4_e4b_decisive/results/metrics.json)\n- verification: [`experiment/ollama_gemma4_e4b_decisive/results/verification.json`](experiment/ollama_gemma4_e4b_decisive/results/verification.json)\n- promotion diagnostic: [`experiment/ollama_gemma4_e4b_decisive/results/promotion_diagnostic.json`](experiment/ollama_gemma4_e4b_decisive/results/promotion_diagnostic.json)\n\nCondition summary from the decisive run:\n\n| condition | tasks | closed | debt AUC | parse failures | validation failures | unresolved obligations | active mutations |\n| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |\n| `baseline_fixed` | 680 | 0 | 2040 | 680 | 680 | 680 | 0 |\n| `oasg_observe_only` | 680 | 0 | 2040 | 680 | 680 | 680 | 0 |\n| `forced_policy_positive_control` | 680 | 463 | 434 | 0 | 217 | 217 | 0 |\n| `oasg_adaptive` | 680 | 337 | 921 | 235 | 343 | 343 | 6 |\n\nPaired effects:\n\n- Adaptive vs baseline debt AUC delta: `-1119`.\n- Adaptive vs baseline debt AUC reduction: `54.85%`.\n- Bootstrap CI for adaptive-baseline debt delta: `[-1179, -1050]`.\n- Forced positive-control vs baseline debt AUC delta: `-1606`.\n- Verification status: `ok`.\n- Invalid ledgers: none reported.\n- Active seeds: `5/5`.\n- Active mutation ids:\n  - `mut_family_safe_expr_prompt_safe_python_expression`\n  - `mut_receipt_template_only_replay_rollback_receipt`\n  - `mut_receipt_template_only_validator_receipt`\n  - `mut_schema_keys_only_json_schema_repair`\n  - `mut_schema_keys_only_obligation_closure`\n  - `mut_strict_json_minimal_code_transform`\n\nScientific interpretation:\n\n- This is positive evidence that OASG can reduce observable operational debt in the tested\n  `gemma4:e4b` workflow setting.\n- It is not evidence that the model became smarter.\n- It is not evidence of universal OASG effectiveness.\n- The baseline was intentionally weak and brittle. The result proves improvement over that fixed\n  workflow, not over a strong hand-tuned production workflow.\n- The forced positive-control was better than OASG adaptive, so OASG did not find the full available\n  policy improvement. It found a substantial subset.\n- The observe-only condition matched baseline, which supports the interpretation that improvement\n  came from active workflow-policy promotion, not from measurement alone.\n\nStrong-baseline follow-up:\n\n- A later strong-baseline protocol qualified a strong static workflow, but OASG did not produce any\n  runner-ledger-backed active policy change from that strong starting point.\n- That run was interrupted after readiness failure, with classification\n  `promotion_mechanism_failure_vs_strong_baseline`.\n- This is negative evidence for the current implementation's ability to add incremental value over\n  that strong static workflow, not a general proof that OASG cannot help stronger baselines.\n- Artifacts:\n  [`experiment/ollama_gemma4_e4b_strong_baseline/results/20260511T113612Z_interrupted/report.md`](experiment/ollama_gemma4_e4b_strong_baseline/results/20260511T113612Z_interrupted/report.md)\n  and\n  [`experiment/ollama_gemma4_e4b_strong_baseline/results/20260511T113612Z_interrupted/interruption_receipt.json`](experiment/ollama_gemma4_e4b_strong_baseline/results/20260511T113612Z_interrupted/interruption_receipt.json).\n\nStrong-baseline v2 protocol:\n\n- The v2 profile added an explicit `incremental_headroom` gate and then completed held-out\n  evaluation after readiness passed.\n- Stage 0: `strong_baseline_qualified`; the strong static policy reduced calibration debt AUC by\n  `7861` bps versus the weak fixed baseline.\n- Stage 1: `debt_headroom_exists`; calibration canaries found 43 incremental candidates.\n- Stage 2: `adaptive_from_strong_ready`; active changes appeared in all 5 seeds.\n- Stage 3: held-out evaluation did not show incremental gain over strong static:\n  - `strong_static_calibrated`: debt AUC `434`, cost units `1580136`, closed `463/680`.\n  - `oasg_adaptive_from_strong`: debt AUC `436`, cost units `1587788`, closed `463/680`.\n  - primary debt delta `+2`, debt CI `[0, 5]`; primary cost delta `+7652`, cost CI\n    `[1534, 14346]`.\n- Final classification: `no_incremental_effect_vs_strong_baseline`.\n- Interpretation: this is negative evidence for incremental value over this strong static workflow,\n  not evidence that OASG cannot help all strong baselines.\n- Curated artifacts:\n  [`experiment/ollama_gemma4_e4b_strong_baseline_v2/results/report.md`](experiment/ollama_gemma4_e4b_strong_baseline_v2/results/report.md),\n  [`experiment/ollama_gemma4_e4b_strong_baseline_v2/results/metrics.json`](experiment/ollama_gemma4_e4b_strong_baseline_v2/results/metrics.json),\n  and\n  [`experiment/ollama_gemma4_e4b_strong_baseline_v2/results/verification.json`](experiment/ollama_gemma4_e4b_strong_baseline_v2/results/verification.json).\n\nNonstationary strong-baseline protocol:\n\n- This profile tests the narrower OASG claim that adaptation should matter when a strong static\n  workflow is calibrated on Phase A but later faces ordered workload drift.\n- Final classification: `oasg_nonstationary_effect_confirmed_timeboxed`.\n- Integrity: verification `ok`, paired post-drift task count `48`, hard-floor regressions `0`.\n- Primary result:\n  - `strong_static_calibrated`: debt AUC `112`, cost units `148517`, closed `20/48`.\n  - `oasg_adaptive_from_strong`: debt AUC `84`, cost units `137059`, closed `27/48`.\n  - debt delta `-28`, debt CI `[-51, -10]`; cost delta `-11458`, cost CI `[-31074, 7272]`.\n  - adaptation lag: Phase B `1` epoch, Phase C `0`, Phase D `0`.\n- Secondary controls:\n  - OASG vs observe-only debt delta `-29`, CI `[-52, -11]`.\n  - OASG vs rule-adaptive debt delta `-30`, CI `[-50, -12]`.\n- Interpretation: this is positive time-boxed evidence for fail-closed post-drift workflow\n  adaptation over a strong static workflow. It does not contradict the fixed-distribution\n  strong-baseline v2 negative result; it narrows the claim to nonstationary operation.\n- Limits: only 2 seeds, controlled synthetic operational drift, local `gemma4:e4b`, deterministic\n  validators, and repository-defined thresholds. It is not universal evidence and does not imply\n  model intelligence improvement.\n- Curated artifacts:\n  [`experiment/ollama_gemma4_e4b_nonstationary_strong_baseline/results/report.md`](experiment/ollama_gemma4_e4b_nonstationary_strong_baseline/results/report.md),\n  [`experiment/ollama_gemma4_e4b_nonstationary_strong_baseline/results/metrics.json`](experiment/ollama_gemma4_e4b_nonstationary_strong_baseline/results/metrics.json),\n  and\n  [`experiment/ollama_gemma4_e4b_nonstationary_strong_baseline/results/verification.json`](experiment/ollama_gemma4_e4b_nonstationary_strong_baseline/results/verification.json).\n\nNonstationary confirmatory follow-up:\n\n- This profile is the larger follow-up to the time-boxed nonstationary result. It runs four\n  variants: full drift replication, no-mixed-reversion ablation, mixed-reversion-only probe, and\n  delayed-drift recovery.\n- Final classification: `phase_specific_nonstationary_support`.\n- Integrity: verification `ok`, all required variants complete, paired post-drift task count `600`,\n  active post-drift OASG seeds `5`, stable A2 active mutation rows `0`, hard-floor regressions `0`.\n- Primary result:\n  - `strong_static_calibrated`: debt AUC `1524`, cost-to-close units `1207320`, closed `259/600`.\n  - `oasg_adaptive_from_strong`: debt AUC `1352`, cost-to-close units `1120239`, closed `300/600`.\n  - primary debt delta `-172`, debt CI `[-220, -121]`; cost delta `-87081`, cost CI\n    `[-104221, -70419]`; closure delta `+41`.\n- Secondary controls:\n  - OASG vs observe-only debt delta `-172`, CI `[-220, -126]`.\n  - OASG vs rule-adaptive debt delta `-98`, CI `[-170, -27]`.\n- Ablations and drift classes:\n  - mixed-only debt delta `-118`, reduction `1639` bps, CI `[-160, -78]`;\n  - mild-only debt delta `-50`, reduction `1562` bps, CI `[-72, -28]`;\n  - no-Phase-D aggregate debt delta `-54`, reduction `672` bps, CI `[-81, -28]`;\n  - structural-only debt delta `-4`, reduction `83` bps, CI `[-12, 0]`, below the\n    `500` bps support threshold.\n- Interpretation: the larger run supports phase-specific post-drift workflow recovery over the\n  strong static baseline in this frozen local protocol, and the result is not explained by\n  observe-only or the rule-adaptive control. It does not meet the stricter broad nonstationary\n  confirmation contract because structural-only support is too small. The most defensible claim is\n  mixed-reversion / policy-retirement-sensitive support plus additional mild-drift support.\n- Limits: the final `classification_receipt.json` has `broad_effect_claim_allowed: false`,\n  `phase_specific_effect_claim_allowed: true`, and legacy `effect_claim_allowed: false` because\n  `effect_claim_allowed` is retained only for broad `oasg_nonstationary_confirmed` claims. The\n  result does not prove universal OASG effectiveness, model intelligence improvement, or\n  deployability of the oracle control.\n- Curated artifacts:\n  [`experiment/ollama_gemma4_e4b_nonstationary_confirmatory/results/report.md`](experiment/ollama_gemma4_e4b_nonstationary_confirmatory/results/report.md),\n  [`experiment/ollama_gemma4_e4b_nonstationary_confirmatory/results/metrics.json`](experiment/ollama_gemma4_e4b_nonstationary_confirmatory/results/metrics.json),\n  and\n  [`experiment/ollama_gemma4_e4b_nonstationary_confirmatory/results/verification.json`](experiment/ollama_gemma4_e4b_nonstationary_confirmatory/results/verification.json).\n\n### Reproduce the Decisive Experiment\n\nRequires local Ollama with `gemma4:e4b` installed.\n\n```powershell\ncd path\\to\\oasg\nuv sync\nollama list\nuv run python experiment\\ollama_gemma4_e4b_decisive\\scripts\\run_decisive_experiment.py --config experiment\\ollama_gemma4_e4b_decisive\\config_decisive.json\nuv run python experiment\\ollama_gemma4_e4b_decisive\\scripts\\analyze_decisive_results.py --run-dir experiment\\ollama_gemma4_e4b_decisive\\runs\\latest --out experiment\\ollama_gemma4_e4b_decisive\\results\n```\n\nThe effect claim is limited to the frozen workload, model, prompts, validators, implementation, and\ndecision thresholds in that experiment profile.\n\n### Reproduce the Nonstationary Strong-Baseline Experiment\n\nRequires local Ollama with `gemma4:e4b` installed. The default config is a short time-boxed\nnonstationary protocol, not a universal benchmark.\n\n```powershell\ncd path\\to\\oasg\nuv sync\nollama list\nuv run python experiment\\ollama_gemma4_e4b_nonstationary_strong_baseline\\scripts\\run_nonstationary_experiment.py --config experiment\\ollama_gemma4_e4b_nonstationary_strong_baseline\\config_nonstationary.json\nuv run python experiment\\ollama_gemma4_e4b_nonstationary_strong_baseline\\scripts\\analyze_nonstationary_results.py --run-dir experiment\\ollama_gemma4_e4b_nonstationary_strong_baseline\\runs\\latest --out experiment\\ollama_gemma4_e4b_nonstationary_strong_baseline\\results\n```\n\n### Reproduce the Nonstationary Confirmatory Experiment\n\nRequires local Ollama with `gemma4:e4b` installed. The main config is a long all-variant run. On the\nreference local machine, the completed run used `20260509` through `20260513` replicate seeds.\n\n```powershell\ncd path\\to\\oasg\nuv sync\nollama list\nuv run python experiment\\ollama_gemma4_e4b_nonstationary_confirmatory\\scripts\\run_confirmatory_experiment.py --config experiment\\ollama_gemma4_e4b_nonstationary_confirmatory\\config_confirmatory_main.json --all-variants\nuv run python experiment\\ollama_gemma4_e4b_nonstationary_confirmatory\\scripts\\analyze_confirmatory_results.py --run-dir experiment\\ollama_gemma4_e4b_nonstationary_confirmatory\\runs\\latest --out experiment\\ollama_gemma4_e4b_nonstationary_confirmatory\\results\n```\n\n## Development Checks\n\nBefore publishing a change or port:\n\n```bash\nuv run pytest\nuv run ruff check\nuv run mypy src\nuv run oasg conformance run examples/conformance\nuv build\n```\n\nAt the time this README was updated after the final nonstationary confirmatory analysis, these\nchecks passed in the current workspace: `113 passed`, `ruff` clean, `mypy` clean, conformance\n`status: ok`, and a clean package build. Built artifacts were scanned for local paths,\nhigh-confidence secret patterns, and raw experiment run payloads.\n\nThe current public-readiness review is recorded in\n[`docs/publication_audit.md`](docs/publication_audit.md).\n\n## Citation\n\nIf you use OASG, cite the archived software release:\n\n- DOI: [10.5281/zenodo.20107660](https://doi.org/10.5281/zenodo.20107660)\n- Repository: [github.com/kadubon/oasg](https://github.com/kadubon/oasg)\n- Citation metadata: [`CITATION.cff`](CITATION.cff)\n\n```yaml\ncff-version: 1.2.0\ntitle: \"OASG: Observable-only Autonomic Slack Gradient for Local-first AI Agent Workflow Optimization\"\nversion: 1.1.0\ndoi: 10.5281/zenodo.20107660\nrepository-code: \"https://github.com/kadubon/oasg\"\n```\n\n## Keywords\n\nAI agents, agent workflow optimization, long-running agents, local-first AI, model-agnostic agent\nframework, no LLM judge, observable ledgers, deterministic reducers, workflow policy optimization,\nautonomic agents, JSONL ledger, canonical hashing, Ollama experiments, Python uv.\n\n## Project Layout\n\n```text\ntheory.md                      v1.0 theory and specification\ndocs/quick_mental_model.md     five-minute engineering mental model\nsrc/oasg/canonical.py          canonical JSON and hash domains\nsrc/oasg/ledger.py             JSONL sealing and prefix verification\nsrc/oasg/reducers/             deterministic reducers\nsrc/oasg/pressure.py           typed pressure vector calculation\nsrc/oasg/scheduler.py          pressure scheduling and fairness state\nsrc/oasg/mutators.py           workflow-policy mutation proposals\nsrc/oasg/optimizer.py          run/watch/supervise optimizer loops\nsrc/oasg/optimizer_state.py    durable optimizer checkpoints\nsrc/oasg/library.py            workflow library state, rollback, quarantine\nsrc/oasg/policy_state.py       structured workflow policy and mutation patches\nsrc/oasg/harness.py            local harness scaffold\nsrc/oasg/policy_effects.py     demo-only policy-patch smoke semantics\nsrc/oasg/runners.py            ledger-replay/demo-replay/local-command runners\nsrc/oasg/klb.py                bounded KLB_2 enumeration\nsrc/oasg/gate.py               dominance gate and witness validation\nsrc/oasg/schemas/              JSON Schema export\nsrc/oasg/adapters/             model/tool connector contracts\nexamples/                      quickstart and conformance fixtures\nexamples/minimal_agent_integration/ shortest agent-to-ledger-to-gate example\nexamples/framework_adapters/   optional plain Python, LangGraph, and CrewAI patterns\nexperiment/                    Ollama experiment protocols and results\ntests/                         unit, integration, and experiment-script tests\n```\n\n## License\n\nApache-2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkadubon%2Foasg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkadubon%2Foasg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkadubon%2Foasg/lists"}