{"id":51381016,"url":"https://github.com/viggomeesters/jsonl-vault-spike","last_synced_at":"2026-07-03T16:09:31.482Z","repository":{"id":367152884,"uuid":"1279488662","full_name":"viggomeesters/jsonl-vault-spike","owner":"viggomeesters","description":"Synthetic JSONL-first vault/context proof of concept for agent-readable personal knowledge systems","archived":false,"fork":false,"pushed_at":"2026-06-24T18:55:19.000Z","size":378,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-24T20:17:18.181Z","etag":null,"topics":["agents","context-engineering","jsonl","personal-knowledge-management","synthetic-data"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/viggomeesters.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":"docs/ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE.md","maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-24T18:28:47.000Z","updated_at":"2026-06-24T18:55:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/viggomeesters/jsonl-vault-spike","commit_stats":null,"previous_names":["viggomeesters/jsonl-vault-spike"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/viggomeesters/jsonl-vault-spike","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viggomeesters%2Fjsonl-vault-spike","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viggomeesters%2Fjsonl-vault-spike/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viggomeesters%2Fjsonl-vault-spike/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viggomeesters%2Fjsonl-vault-spike/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/viggomeesters","download_url":"https://codeload.github.com/viggomeesters/jsonl-vault-spike/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viggomeesters%2Fjsonl-vault-spike/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35092330,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-03T02:00:05.635Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","context-engineering","jsonl","personal-knowledge-management","synthetic-data"],"created_at":"2026-07-03T16:09:29.115Z","updated_at":"2026-07-03T16:09:31.470Z","avatar_url":"https://github.com/viggomeesters.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/jsonl-vault-spike-hero.svg\" alt=\"JSONL Vault Spike hero\" width=\"100%\"\u003e\n\u003c/p\u003e\n\n# jsonl-vault-spike\n\nProject page: \u003chttps://viggomeesters.com/jsonl-vault-spike/\u003e\n\n[![Status](https://img.shields.io/badge/status-proof--of--concept-blue)](#)\n[![Data](https://img.shields.io/badge/data-synthetic--only-green)](#safety-boundary)\n[![Gate](https://img.shields.io/badge/gate-make%20check-111827)](#verification)\n\nA public, synthetic proof of concept for replacing a Markdown/Obsidian-style vault with an **agent-readable JSONL context layer**.\n\nThe repo demonstrates the core shape:\n\n```text\nraw evidence -\u003e typed JSONL records -\u003e context bundles -\u003e generated SQLite/Markdown views\n```\n\n## Why this exists\n\nMarkdown notes are good for humans but weak as an agent source of truth: links are implicit, claims blur into prose, provenance gets lost, and retrieval often depends on guesswork. This spike tests a stricter model where small typed records are canonical and human views are generated.\n\n## Safety boundary\n\n**No real personal data.**\n\nThis repository intentionally uses synthetic examples only. Do not add real vault exports, real names, messages, emails, file paths, credentials, screenshots, or real attachments. The checked-in media fixtures are tiny synthetic files used only to prove content-addressed storage and validation.\n\n## Repository map\n\n| Path | Role | Canonical? |\n| --- | --- | --- |\n| `raw/*.jsonl` | Synthetic source evidence | input |\n| `records/*.jsonl` | Typed records: entities, projects, claims, relations, tasks, decisions, files, attachments, media links, plus 10,000 synthetic note records | yes |\n| `fixtures/import-demo/` | Synthetic Markdown/mail/folder-drop input for the 9000x import flow | fixture input |\n| `objects/sha256/` | Tiny synthetic content-addressed object fixtures used for hash/integrity tests | fixture input |\n| `schema/*.schema.json` | JSON Schema contracts per record type | yes |\n| `retrieval/*.jsonl` | Query hints for agents | yes |\n| `evals/*.jsonl` | Retrieval expectations | yes |\n| `reports/vault-schema-coverage.json` | Coverage proof for every vault-schema type/category pair | generated but tracked |\n| `dist/` | SQLite and bundle outputs | generated |\n| `views/markdown/` | Human-readable Markdown exports | generated examples |\n\n## Quick start\n\n```bash\ngit clone https://github.com/viggomeesters/jsonl-vault-spike.git\ncd jsonl-vault-spike\npython3 -m venv .venv\n.venv/bin/pip install -r requirements.txt\nmake check\n```\n\n## CLI\n\nRun from the repo:\n\n```bash\npython3 scripts/generate_synthetic_dataset.py --count 10000\npython3 scripts/vaultctx.py validate\npython3 scripts/vaultctx.py query vault migration\npython3 scripts/vaultctx.py bundle --goal \"replace markdown with jsonl\"\npython3 scripts/vaultctx.py import-demo\npython3 scripts/vaultctx.py render-import-demo-dashboard\npython3 scripts/vaultctx.py verify-objects\npython3 scripts/vaultctx.py render-media-report\npython3 scripts/vaultctx.py build-sqlite\npython3 scripts/vaultctx.py render-views\npython3 scripts/vaultctx.py inspect-media --path objects/sha256/\u003cprefix\u003e/\u003csha256\u003e.png\n```\n\nMedia/file support is a complete synthetic MVP slice:\n\n- `fixtures/import-demo/` models the source side: Markdown embeds, a synthetic mail with attachment manifest, and folder-drop files;\n- `import-demo` turns those fixtures into generated demo records and content-addressed objects under `dist/import-demo/` without overwriting canonical records;\n- `objects/sha256/` contains three tiny synthetic fixtures: PNG, PDF, and CSV;\n- `records/files.jsonl` stores content hashes, MIME type, byte size, and synthetic blob refs;\n- `records/attachments.jsonl` stores note/source attachment occurrences;\n- `records/media_assets.jsonl` stores media metadata derived from files;\n- `records/media_links.jsonl` stores resolved and missing media references;\n- `verify-objects` proves file records match object bytes by SHA-256 and size;\n- `render-media-report` writes an aggregate-only media summary.\n- `render-import-demo-dashboard` writes a browsable HTML dashboard that shows the flow from source fixture to attachment/media link, file record, SHA-256 object and media asset.\n\nNo real filenames, real attachments, local paths, screenshots, OCR text, transcripts, thumbnails, or base64 payloads are committed.\n\nInstall as a package:\n\n```bash\npython3 -m pip install .\nvaultctx validate\n```\n\n\n## Vault-schema coverage dataset\n\nThe repo includes `records/notes.jsonl` with **10,000 public-safe synthetic note records** generated from [`viggomeesters/vault-schema`](https://github.com/viggomeesters/vault-schema). The generator covers every current vault-schema `type/category` pair at least once, then scales deterministically.\n\nCoverage proof lives in `reports/vault-schema-coverage.json`:\n\n- 11 schema types;\n- 88 type/category pairs;\n- 0 missing pairs;\n- 10,000 generated note records;\n- generated `schema/note.schema.json` constraints for valid `vault_type`, valid `category` per `vault_type`, and valid `area` per `vault_type/category` pair.\n\nThe fixture is now matrix-strict for the fields that `vault-schema` exposes in `type_category_area`. It still does not invent deeper subtype-specific content fields that are not present in the public schema contract.\n\nRegenerate after a schema change:\n\n```bash\npython3 scripts/generate_synthetic_dataset.py --count 10000\nmake check\n```\n\n## Agent usage\n\nAgents should treat `records/*.jsonl` as the source of truth and use generated bundles for bounded context. A useful default flow is:\n\n1. validate record contracts;\n2. query relevant records;\n3. generate a bundle for the current goal;\n4. cite `source` / `evidence` records before making claims;\n5. regenerate views and SQLite after canonical records change.\n\n## Record model\n\nThe examples use `record_type` as the technical discriminator. Domain-specific subtypes remain explicit, for example `entity_type`, `source_type`, `relation_type`, `task_type`, and `vault_type`. See [`docs/RECORD_MODEL.md`](docs/RECORD_MODEL.md) for the practical migration model: Markdown notes become source/entity/relation/claim/task records with stable IDs and references, then Markdown views are regenerated from JSONL.\n\n## Testing against a real Obsidian vault\n\nUse [`docs/VAULT_EVALUATION.md`](docs/VAULT_EVALUATION.md) for the read-only, private dry-run protocol. The public repo stays synthetic; real-vault evaluation output belongs under `.local/` or `/tmp/` and must not be committed.\n\nLocal aggregate comparison:\n\n```bash\npython3 scripts/evaluate_obsidian_vault.py --vault /path/to/local/vault --limit 75 --out .local/vault-eval\n```\n\nThis writes `aggregate-metrics.json`, `scorecard.json`, and `value-prop-comparison.html` under `.local/vault-eval/`. The report contains counts and percentages only: no note titles, paths, body text, names, or screenshots.\n\n## Verification\n\nFull local gate:\n\n```bash\nmake check\npython3 -m py_compile jsonl_vault_spike/*.py scripts/*.py tests/*.py\n```\n\n`make check` includes:\n\n- repository guard for public-data safety and required public files;\n- JSONL record validation;\n- content-addressed synthetic object hash validation;\n- tests;\n- SQLite build;\n- aggregate media report rendering;\n- Markdown view rendering;\n- demo bundle generation.\n\n## Package and release\n\nSee [`docs/PACKAGE.md`](docs/PACKAGE.md). The README hero prompt/provenance is saved in [`docs/HERO_PROMPT.md`](docs/HERO_PROMPT.md).\n\n## Contributing\n\nRead [`CONTRIBUTORS.md`](CONTRIBUTORS.md), [`SUPPORT.md`](SUPPORT.md), and [`SECURITY.md`](SECURITY.md) first. Keep all examples synthetic.\n\n## License\n\nMIT. See [`LICENSE`](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviggomeesters%2Fjsonl-vault-spike","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fviggomeesters%2Fjsonl-vault-spike","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviggomeesters%2Fjsonl-vault-spike/lists"}