{"id":51277764,"url":"https://github.com/jeremi/registry-forge","last_synced_at":"2026-06-29T22:31:53.536Z","repository":{"id":361410180,"uuid":"1251871563","full_name":"jeremi/registry-forge","owner":"jeremi","description":"Registry Forge: local preparation CLI for registry source data.","archived":false,"fork":false,"pushed_at":"2026-06-20T09:11:58.000Z","size":133,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-20T10:08:04.463Z","etag":null,"topics":["csv","data-quality","digital-public-infrastructure","govtech","registry","rust","xlsx"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jeremi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-28T01:39:18.000Z","updated_at":"2026-06-20T09:12:00.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jeremi/registry-forge","commit_stats":null,"previous_names":["jeremi/registry-forge"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jeremi/registry-forge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeremi%2Fregistry-forge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeremi%2Fregistry-forge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeremi%2Fregistry-forge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeremi%2Fregistry-forge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jeremi","download_url":"https://codeload.github.com/jeremi/registry-forge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeremi%2Fregistry-forge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34945707,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-29T02:00:05.398Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","data-quality","digital-public-infrastructure","govtech","registry","rust","xlsx"],"created_at":"2026-06-29T22:31:53.009Z","updated_at":"2026-06-29T22:31:53.526Z","avatar_url":"https://github.com/jeremi.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Registry Forge\n\n\u003e **Experimental:** This codebase is under active development. Its CLI,\n\u003e recipe schema, and package layout may change before 1.0.\n\nRegistry Forge is a local Rust CLI for preparing synthetic or non-real registry\nsource data into a replayable preparation package.\n\nIt is the MVP implementation of the Registry Forge Engine spec. There is no UI\nyet. The expected operator is Codex or a developer running commands, reading\nreports, applying JSON Patch recipe changes, and editing Crosswalk mappings.\n\nFor a guided walkthrough, see [TUTORIAL.md](TUTORIAL.md).\n\nFor agent-facing operating instructions, see\n[`agent-skills/registry-forge-operator/SKILL.md`](agent-skills/registry-forge-operator/SKILL.md).\n\n## Current Status\n\n0.1.0 is a local-first MVP. It supports file-based CSV and `.xlsx` sources,\nstrict `forge.recipe.yaml` validation, deterministic semantic alignment\nsuggestions from a local profile bundle, Crosswalk mapping previews, readiness\nvalidation, and portable package export.\n\nThe tool is intended for demos on synthetic or non-real data and for proving the\noperator workflow before a UI or hosted service exists.\n\n## What It Does\n\n- Reads local CSV and `.xlsx` sources without modifying them.\n- Records source and profile-bundle hashes in `forge.recipe.yaml`.\n- Inspects source structure and parser warnings.\n- Profiles fields, missingness, distinct counts, duplicate values, top values,\n  type hints, candidate identifiers, and candidate code lists.\n- Suggests deterministic semantic alignments from a pinned local profile bundle.\n- Applies RFC 6902 JSON Patch operations to recipes.\n- Generates review-needed Crosswalk mapping scaffolds from accepted alignments.\n- Runs Crosswalk previews and writes redacted canonical JSONL samples.\n- Validates readiness and blocks false-ready states.\n- Exports a portable package without raw source files.\n\n## Runtime Config Boundary\n\nForge prepares data mapping packages and candidate artifacts for review. It\ndoes not own Registry Relay, Registry Notary, or registryctl runtime\nconfiguration semantics, product doctor rules, credentials, deployment profiles,\ngoverned apply behavior, or live config changes.\n\nFiles under exported `candidates/` directories are review inputs for downstream\nauthoring flows. They are not deployable runtime configs by themselves. Use\nregistryctl or the owning product repository to generate runtime config, run\nproduct doctor validation, produce `registry.config.diagnostic_report.v1`\nreports, and apply governed config changes.\n\n## Supported Inputs\n\n- CSV files.\n- `.xlsx` workbooks via `calamine`.\n\nLegacy binary `.xls` is intentionally rejected in the MVP. Convert those files\nto `.xlsx` before import.\n\nRecipe paths for the source, mapping, and profile bundle must be relative and\nmust not contain `..`. `--source-override` follows the same rule.\n\n## Demo Fixtures\n\nAll demo data is synthetic.\n\n- `fixtures/demo`: baseline farmer registry happy path.\n- `fixtures/demo-households-csv`: clean household registry happy path with a\n  separate semantic profile bundle.\n- `fixtures/demo-messy-csv`: messy farmer CSV with duplicate headers, blank\n  header, uneven rows, missing values, duplicate IDs, and sensitive names.\n\nGenerated `reports/`, `patches/`, and `previews/` directories are not committed.\n\n## Repository Map\n\n- [src/](src/): CLI and implementation.\n- [tests/](tests/): integration tests for command behavior and readiness gates.\n- [fixtures/](fixtures/): synthetic demo source data, recipes, mappings, and\n  profile bundles.\n- [TUTORIAL.md](TUTORIAL.md): guided demo walkthrough.\n- [agent-skills/](agent-skills/): repo-local Codex skill for operating Forge.\n- [IMPLEMENTATION_LOG.md](IMPLEMENTATION_LOG.md): implementation notes,\n  verification definition, and known pitfalls encountered during the MVP.\n\n## Local Setup\n\nRegistry Forge pins `crosswalk-core` from the public Crosswalk repository.\n\n```sh\ncargo build --workspace\n```\n\n## Happy Path\n\n```sh\ncargo run -- check-recipe fixtures/demo/forge.recipe.yaml\ncargo run -- inspect-source fixtures/demo/forge.recipe.yaml\ncargo run -- profile-source fixtures/demo/forge.recipe.yaml\ncargo run -- suggest-alignments fixtures/demo/forge.recipe.yaml\ncargo run -- preview-transform fixtures/demo/forge.recipe.yaml\ncargo run -- validate-output --require-status ready_candidate fixtures/demo/forge.recipe.yaml\ncargo run -- export-package fixtures/demo/forge.recipe.yaml --out target/forge-demo-package\n```\n\nReplay the exported package against the original source bytes:\n\n```sh\ncargo run -- check-recipe target/forge-demo-package/forge.recipe.yaml\ncargo run -- preview-transform \\\n  --source-override fixtures/demo/data/farmers.csv \\\n  --out target/replay-canonical-samples.redacted.jsonl \\\n  target/forge-demo-package/forge.recipe.yaml\ncargo run -- validate-output \\\n  --require-status ready_candidate \\\n  --source-override fixtures/demo/data/farmers.csv \\\n  target/forge-demo-package/forge.recipe.yaml\ncmp target/forge-demo-package/previews/canonical-samples.redacted.jsonl \\\n  target/replay-canonical-samples.redacted.jsonl\n```\n\n## Verification\n\n```sh\ncargo fmt --check\ncargo clippy --workspace --all-targets -- -D warnings\ncargo test --workspace\n```\n\nThe integration suite covers readiness blockers, path traversal, source hash\nchecks, package replay, source immutability, deterministic suggestions, mapping\ncompile failures, and redaction/package leak checks.\n\n## Known MVP Gaps\n\n- Composite identifier detection is not implemented yet.\n- XLSX formula diagnostics are pragmatic XML checks, not a full workbook model.\n- Workbook XML inspection reopens the verified file, so a narrow TOCTOU window\n  remains.\n- The implementation is a single crate for the MVP. Split into core, ingest,\n  profile, transform, export, and CLI crates when the surface grows.\n\n## Security\n\nDo not use real personal data with this MVP unless a project-specific data\nhandling policy explicitly permits it. Report suspected vulnerabilities through\nGitHub private vulnerability reporting. See [SECURITY.md](SECURITY.md).\n\n## License\n\nApache-2.0. See [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjeremi%2Fregistry-forge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjeremi%2Fregistry-forge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjeremi%2Fregistry-forge/lists"}