https://github.com/franchoy/coldkeep
coldkeep is an experimental local-first content-addressed file storage engine with verifiable integrity written in Go. Files are split into content-addressed chunks, packed into container files on disk, and tracked through PostgreSQL metadata to attempt deduplication
https://github.com/franchoy/coldkeep
backup cold-storage content-defined-chunking deduplicate go research-project storage-engines
Last synced: 23 days ago
JSON representation
coldkeep is an experimental local-first content-addressed file storage engine with verifiable integrity written in Go. Files are split into content-addressed chunks, packed into container files on disk, and tracked through PostgreSQL metadata to attempt deduplication
- Host: GitHub
- URL: https://github.com/franchoy/coldkeep
- Owner: franchoy
- License: apache-2.0
- Created: 2026-02-15T19:58:15.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-29T20:01:08.000Z (3 months ago)
- Last Synced: 2026-03-29T20:23:53.853Z (3 months ago)
- Topics: backup, cold-storage, content-defined-chunking, deduplicate, go, research-project, storage-engines
- Language: Go
- Homepage:
- Size: 14 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# coldkeep

Correctness-first cold storage engine
• Content-addressed • Built-in deduplication • Deterministic restore
• Verifiable integrity • Crash-safe • GC-safe
## Branding
Coldkeep uses a visual identity based on an ice cube vault:
- 🧊 cold storage (ice cube)
- 🔒 secure data (vault door)
- 🗄️ structured containers (internal shelves)
## Project Status





> Status: v1.9 formalizes transform-based storage semantics (logical/compressed/physical layers) with block-level compression and explicit staged verification, while preserving deterministic restore, GC safety, snapshot semantics, and mixed-repository compatibility.
> Migration note (v1.9): existing v1.7/v1.8 payloads remain readable through compatibility paths with no forced rewrite or recompression. Missing PostgreSQL schema requires manual schema application or `COLDKEEP_DB_AUTO_BOOTSTRAP=true`. Existing older schemas are auto-upgraded to the required v15 schema at startup.
## Current release state
Coldkeep v1.11 introduced the behavior-preserving engine facade.
The current development focus is v1.12:
- migrate business orchestration into engine entry points;
- introduce catalog/metadata facade boundaries;
- prepare the SQLite-first local catalog direction;
- preserve PostgreSQL compatibility;
- keep CLI behavior, JSON output, exit codes, and repository/storage formats stable.
The v1.12 migration must preserve existing behavior first and prove parity before lifting logic. No
command is routed through the engine unless its request/result contract can represent the existing
command behavior.
coldkeep is a local-first content-addressed storage engine focused on deterministic restore,
explicit integrity verification, and safe lifecycle behavior under failure scenarios.
Now with snapshot lineage, diff summaries, and safe deletion insights.
## Why coldkeep?
coldkeep is designed for correctness-first cold storage.
Unlike traditional backup tools, it emphasizes:
- deterministic, byte-identical restore
- content-addressed deduplication
- explicit, test-backed integrity checks
- safe recovery and reference-safe garbage collection
- machine-readable CLI behavior suitable for automation
The goal is confidence and recoverability over maximum throughput.
v1.7 performance work followed the existing execution model: bounded worker-based commands under explicit safety constraints, without turning coldkeep into a fully concurrent daemon or changing on-disk format, chunk layout, or operator-visible schema compatibility. v1.8 introduced packed multi-chunk storage blocks and completed AES-GCM packed-block integration. v1.9 builds on that foundation with formal transform-aware storage semantics, block-level compression, and explicit verification stages while preserving restore determinism, snapshot semantics, and GC safety.
## Features
- Snapshot lineage (`--from`)
- Snapshot diff summaries
- Snapshot tree visualization
- Safe deletion preview (`--dry-run`)
- Read-only observability (`stats`, `inspect`)
- Exact GC simulation with trace support
- Built-in deduplication
- Deterministic restore
## Status
Coldkeep has ten explicit correctness layers:
- v1.0: storage correctness (restore determinism, integrity, recovery, GC safety)
- v1.1: interaction correctness (CLI orchestration, machine-readable contracts, batch semantics)
- v1.2: physical-file graph coherence, explicit repair semantics, audited GC refusal, and invariant-aware batch maintenance reporting
- v1.3: snapshot-based retention as a correctness layer (immutable point-in-time captures, snapshot-protected GC, reachability audits)
- v1.4: snapshot clarity and lifecycle hardening (explicit lineage semantics, safer dry-run wording, stricter pre-release verification guidance)
- v1.5: chunker-evolution compatibility contract clarity (mixed-version repositories, explicit new-writes-only chunker policy)
- v1.6: observability and simulation contract hardening (read-only introspection, exact GC simulation parity, trace channel behavior)
- v1.7: controlled-execution performance validation (benchmarking, deterministic comparison, and release-readiness safety proof without storage-format or schema-breaking change)
- v1.8: packed block abstraction and AES-GCM packed-block integration (multi-chunk storage blocks, dual-compat read path, locked block-size defaults, configurable operator override, release hardening)
- v1.9: transform-based storage architecture freeze (block-level compression, logical/compressed/physical hash semantics, metadata-driven read path, and explicit staged verification)
Guarantees are enforced through automated validation and CI gates; see [VALIDATION_MATRIX.md](VALIDATION_MATRIX.md) for guarantee-to-evidence mapping.
If you are new to the project, start here, then continue to [ARCHITECTURE.md](ARCHITECTURE.md) for the internal model and [VALIDATION_MATRIX.md](VALIDATION_MATRIX.md) for the guarantee-to-evidence map.
## v1.9 Storage Contract
- v1.9 keeps packed storage blocks as the default write path for new data.
- The default packed block size is 1 MiB.
- `COLDKEEP_BLOCK_TARGET_SIZE_MB` exists as an advanced operator tuning override for new writes only. Valid values for v1.9: `1`, `2`, `3` (MiB). Other values log a warning and use the locked default. This override is retained for benchmarking and specialized operator tuning; production deployments should use the default.
- `COLDKEEP_PACKED_BLOCK_SIZE_MIB` is a legacy fallback environment variable checked only if `COLDKEEP_BLOCK_TARGET_SIZE_MB` is not set. It is accepted for backward compatibility; new configurations should use `COLDKEEP_BLOCK_TARGET_SIZE_MB`.
- v1.9 reads existing v1.7/v1.8 repositories without rewriting historical data.
- v1.9 writes packed blocks for new data through `storage_blocks` and `chunk_block_refs`.
- Mixed repositories containing legacy v1.7/v1.8 data and new v1.9 compressed/encrypted blocks are valid steady-state.
- v1.7 is not guaranteed to read repositories that contain v1.8/v1.9 packed-block data.
- Both `plain` and `aes-gcm` codec settings work end-to-end with packed writes. When `COLDKEEP_CODEC=aes-gcm`, the full encoded block is AES-GCM encrypted and `storage_blocks.codec` is set to `"aes-gcm"`; stored bytes are a 12-byte nonce prefix followed by the ciphertext. When `COLDKEEP_CODEC=plain`, `storage_blocks.codec` is `"none"` and stored bytes are the plaintext encoded block. The read path (`StorageBlockReader`) handles both layouts transparently using per-block metadata.
- Compression settings (`none` / `zstd`) affect future writes only and never rewrite historical blocks.
## Compression and Integrity Contract (Pre-v1.10 Freeze)
Compression behavior:
- Compression is block-level.
- Compression happens before encryption.
- Compression configuration affects only newly written blocks.
- Existing blocks are never recompressed automatically.
- Reads and verify use per-block metadata, so mixed repositories (legacy + new transform metadata) are valid steady-state.
- Compression is store-if-smaller: some zstd-configured blocks are intentionally stored uncompressed when compression would expand payload size.
- Compression does not change dedup identity; dedup remains anchored to logical block content.
Integrity checkpoints:
- `logical_hash` (`block_hash`) verifies decoded logical block content.
- `payload_hash` is a deprecated lowercase-hex mirror of `block_hash` retained for compatibility/observability only.
- `compressed_hash` verifies pre-encryption compressed payload.
- `physical_hash` verifies exact persisted bytes in container storage.
## Core Guarantees
### Summary
- deterministic, byte-identical restore
- no exposure of partially written or inconsistent data
- GC is reference-safe: no reachable chunk is ever deleted
- Atomic restore replacement (within single-node local filesystem semantics)
- Safe in-process concurrent storage operations
### Core invariants
Guarantee IDs are stable and tracked in [VALIDATION_MATRIX.md](VALIDATION_MATRIX.md):
- G1: deterministic, byte-identical restore
- G2: repeat store does not drift chunk graph
- G3: no exposure of partially written or inconsistent data
- G4: GC is reference-safe (no reachable chunk is deleted)
- G5: atomic restore replacement (single-node local filesystem semantics)
- G6: safe in-process concurrent storage operations
- G7: deep corruption detection (payload/offset/tail)
- G8: corrective health gate contract stability
- G9: deterministic batch CLI orchestration and automation-safe contract behavior
- G10: current-state physical mapping graph coherence is audited in standard verify
- G11: GC executes only on an audited coherent physical-root graph
- G12: invariant failures expose stable machine-readable classification and operator guidance
- G13: batch maintenance commands expose deterministic execution semantics and invariant-aware per-item reporting
- G14: snapshot-retained content is GC-safe and protected by liveness union (current + snapshot roots)
- G15: snapshot deletion only changes metadata and future GC eligibility (content preserved)
- G16: stats expose snapshot-retention pressure to operators (retained-only-by-current, retained-only-by-snapshot, shared)
- G17: verify and doctor audit persisted snapshot reachability integrity and report retention context
Definitions and evidence mapping for G1-G17 are tracked in [VALIDATION_MATRIX.md](VALIDATION_MATRIX.md).
Documentation is split into:
- [README.md](README.md) for overview, quickstart, and CLI usage
- [ARCHITECTURE.md](ARCHITECTURE.md) for the internal model, invariants, lifecycle, and trust boundary
- [COMPATIBILITY.md](COMPATIBILITY.md) for version-compatibility, chunker-evolution contract, and explicit non-guarantees
- [VALIDATION_MATRIX.md](VALIDATION_MATRIX.md) for guarantee-to-evidence mapping
- [CONTRIBUTING.md](CONTRIBUTING.md) for contributor workflow, local CI guidance, and stats benchmark commands for observability-sensitive changes
- [PRE_RELEASE_CHECKLIST.md](PRE_RELEASE_CHECKLIST.md) for release-gate execution
- [SECURITY.md](SECURITY.md) for the threat model and security limits
- [docs/internal/storage_compatibility_matrix.md](docs/internal/storage_compatibility_matrix.md) for the formal storage compatibility matrix and benchmark scope split
- [docs/PATH_IDENTITY.md](docs/PATH_IDENTITY.md) for current-state path identity policy
- [CHANGELOG.md](CHANGELOG.md) for milestone history
For the deeper model (invariants, lifecycle, validity, recovery, trust boundary), see [ARCHITECTURE.md](ARCHITECTURE.md).
## Chunking at a Glance
coldkeep uses content-defined chunking (CDC).
- chunk boundaries depend on data patterns (not fixed-size windows),
- different chunker versions can choose different boundary strategies,
- stored state is a chunked reconstruction recipe (`file_chunk -> chunk -> blocks`), not a raw whole-file blob.
Example:
```text
File A (v1):
[chunk1][chunk2][chunk3]
File B (v2):
[chunk4][chunk5]
```
Even with overlapping content, layout can differ across chunker versions.
## Chunker Versions
- each committed logical file stores `chunker_version` metadata,
- one repository can contain multiple chunker versions,
- chunker version is selected at store time,
- fresh v1.5+ repositories default new writes to `v2-fastcdc`,
- upgraded repositories preserve prior write default (`v1-simple-rolling` unless explicitly changed),
- chunks may be reused across chunker versions if their content is identical,
- cross-version reuse is opportunistic and not guaranteed for efficiency ratios,
- `chunker_version` on chunk rows is origin metadata, not a reuse constraint,
- restore is recipe-driven and does not depend on the active write chunker.
Configure repository write default:
```bash
coldkeep config set default-chunker
```
This affects new writes only and does not rewrite existing data.
## Safety Guarantees (High-Level)
- restore correctness: stored files restore byte-identically,
- snapshot stability: snapshots remain valid across upgrades,
- non-destructive evolution: no automatic background re-chunking or silent rewrite,
- forward-compatible metadata: unknown but well-formed future chunker labels do not block restore.
For full guarantees, non-guarantees, and upgrade behavior details:
- [COMPATIBILITY.md](COMPATIBILITY.md)
- [ARCHITECTURE.md](ARCHITECTURE.md)
Legacy compatibility contract (v1.9):
- mandatory: old repositories remain readable/restorable
- not guaranteed: automatic rewrite, recompression, or eager migration of historical data
## When to use coldkeep
Good fit:
- cold/backup storage where correctness matters more than speed
- environments needing explicit integrity verification
- deduplication + deterministic restore use cases
Not a fit (v1.x scope):
- hot-path high-throughput storage
- distributed/multi-node coordination
## Quickstart
A small samples directory is included for local testing.
If you only want the fastest successful first run, use the Local (no Docker)
path below, then come back to the later sections as needed.
### Local (no Docker)
```bash
# 1) Initialize key material (.env)
coldkeep init
# 2) Load environment
export $(cat .env | xargs)
# 3) Configure local PostgreSQL connection (required for local mode)
export DB_HOST=127.0.0.1
export DB_PORT=5432
export DB_USER=coldkeep
export DB_PASSWORD=coldkeep
export DB_NAME=coldkeep
export DB_SSLMODE=disable
export COLDKEEP_DB_AUTO_BOOTSTRAP=true
# 4) Store and inspect
coldkeep store samples/hello.txt
coldkeep stats
# 5) Restore + verify
# restore expects file ID(s), not source filename
coldkeep restore 1 ./restored
coldkeep verify system --standard
```
Security note: if the encryption key is lost, encrypted data cannot be recovered.
Command form tips:
- `restore` expects logical file IDs (`coldkeep restore `); use `--stored-path` if you want path-based restore.
- `verify` expects a target: `coldkeep verify system ...` or `coldkeep verify file ...`.
### Docker
```bash
# 1) Start services
docker compose up -d --build
# 2) Initialize key material on host-mounted workspace
docker compose run --rm -v "$PWD:/app" coldkeep init
# 3) Store a sample file
docker compose run --rm \
--env-file .env \
-v "$PWD/samples:/samples" \
coldkeep store /samples/hello.txt
```
## Smoke Validation (Two Approaches)
If you are preparing a PR, run the smoke gate (`scripts/smoke.sh`) with either
workflow below. Both are valid and both are used by contributors.
PR author tip: use the PR template at [`.github/pull_request_template.md`](.github/pull_request_template.md)
to summarize invariants and lifecycle-semantics impact for reviewers.
For a contributor-oriented local CI path before that, see [CONTRIBUTING.md](CONTRIBUTING.md).
If your change touches `coldkeep stats` or stats query shape, the same guide also includes a short stats benchmarking section with small/medium/large benchmark commands.
### Approach A: Docker runner
Use the `coldkeep` service container to run the smoke script.
```bash
# 1) Ensure PostgreSQL service is up
docker compose up -d coldkeep_postgres
# 2) Load encryption env from .env generated by coldkeep init
set -a
source .env
set +a
# 3) Run smoke inside the coldkeep container
docker compose run --rm \
-e COLDKEEP_KEY="$COLDKEEP_KEY" \
-e COLDKEEP_CODEC="$COLDKEEP_CODEC" \
-v "$PWD/samples:/samples:ro" \
--entrypoint sh coldkeep \
-lc 'apk add --no-cache jq >/dev/null && COLDKEEP_SAMPLES_DIR=/samples scripts/smoke.sh'
```
### Approach B: Host runner
Run the smoke script on host with a local binary, pointing to Docker PostgreSQL.
```bash
# 1) Ensure PostgreSQL service is up
docker compose up -d coldkeep_postgres
# 2) Build coldkeep locally and load encryption env
go build -o coldkeep ./cmd/coldkeep
set -a
source .env
set +a
# 3) Run smoke from host against Docker PostgreSQL
DB_HOST=127.0.0.1 \
DB_PORT=5432 \
DB_USER=coldkeep \
DB_PASSWORD=coldkeep \
DB_NAME=coldkeep \
DB_SSLMODE=disable \
PATH="$PWD:$PATH" \
./scripts/smoke.sh
# 4) Optional cleanup of local binary
rm -f coldkeep
```
Notes:
- `scripts/smoke.sh` requires `jq` and `coldkeep` on PATH in the execution environment.
- Containerized simulate checks may print a non-fatal warning about sqlite/cgo stubs; smoke continues unless `COLDKEEP_SMOKE_STRICT_SIMULATE=1` is set.
## CLI Basics
Typical flows:
```bash
coldkeep store file.txt
coldkeep store-folder ./data
coldkeep restore 12 ./out
coldkeep restore --stored-path docs/report.txt --destination ./out/report.txt --mode override
coldkeep remove 12
coldkeep gc
coldkeep stats
coldkeep list
coldkeep search report
coldkeep verify system --standard
coldkeep doctor
```
Simulation (no physical writes):
```bash
coldkeep simulate store-folder ./data
coldkeep simulate store file.txt --output json
```
Observability and GC simulation (read-only):
```bash
coldkeep stats
coldkeep stats --json
coldkeep inspect
coldkeep inspect --relations
coldkeep inspect --reverse
coldkeep inspect --deep --limit N
coldkeep simulate gc
coldkeep simulate gc --delete-snapshot
coldkeep simulate gc --containers
# trace diagnostics are emitted on stderr
coldkeep stats --trace
coldkeep inspect chunk --trace-json
coldkeep simulate gc --trace-json
```
Supported inspect entities currently include: `file` (alias: `logical-file`), `chunk`, `container`, and `snapshot`.
Observability command guarantees (v1.6):
- `stats`, `inspect`, and `simulate gc` are read-only command surfaces.
- `simulate gc` is an exact simulation of GC reclaimability under the same integrity gates.
- `simulate gc` previews exact GC reclaimability using the shared GC planning layer (`gc.BuildPlan`), including fully-dead active containers; it is not legacy `gc --dry-run` behavior.
- GC simulation does not mutate repository state (no database writes and no filesystem writes).
- JSON output is intended for tooling/automation contracts.
- `meta.version` is the CLI JSON contract version. It remains `v1.7` for additive, backward-compatible fields (including v1.8/v1.9 `stats.block_layout` additions) and only bumps on breaking JSON contract changes.
- Deep inspect output can be large; use `--limit N` to bound traversal output for operators and CI.
- `--trace` and `--trace-json` are diagnostics channels; traces are emitted to stderr so stdout data remains stable for piping.
- v1.8/v1.9 `stats` includes block-layout observability for packed storage: `storage_blocks_count`, `chunk_block_refs_count`, `avg_chunks_per_block`, `avg_block_plaintext_size`, `avg_block_stored_size`, `avg_block_fill_ratio`, `legacy_block_count`, `packed_block_count`, and `codec_distribution` when packed blocks are present.
Operator-facing v1.9 delta for common commands:
- `coldkeep store`, `restore`, `verify system --standard`, `gc --dry-run`, `gc`, `stats --json`, and `inspect` keep their existing invocation shape; v1.9 does not add new required flags to these commands.
- `stats` may include packed-block metrics in human and JSON output.
- `verify` may surface packed-block integrity categories such as packed block hash or metadata corruption.
- Block abstraction is documented, but remains a compatibility-layer change rather than a new mandatory operator workflow.
Chunker benchmark and interpretation:
```bash
coldkeep benchmark chunkers --output json
coldkeep benchmark run --dataset small --repeat 1 --output json
scripts/run_phase8_blocksize_matrix.sh --list-missing
```
v1.9 supports both CLI and scripted benchmark workflows.
- Use `coldkeep benchmark chunkers` and `coldkeep benchmark run` for operator-facing repeatable local measurements.
- Use `scripts/run_phase8_*.sh` and `scripts/compare_phase8_*.py` for release matrix orchestration and historical comparison workflows.
Typical outcomes to expect (informational ranges):
- Small modifications:
v1: ~92-96% reuse
v2: ~94-98% reuse
- Shifted data:
v1: ~5-20% reuse
v2: ~25-50% reuse
Interpretation note: the shifted-data reuse gap is the main justification signal for v2 FastCDC boundary stability improvements.
Critical insight: this indicates FastCDC improves not only dedup ratio, but dedup stability over time under boundary-shifting changes.
Common mistakes to avoid:
- Do not assert exact chunk counts; implementations can vary slightly while preserving correctness.
- Do not use non-deterministic input data; keep all generated data seed-driven for CI reliability.
- Do not ignore shifted-data comparisons; this is the most important stability signal.
- Do not overcomplicate metrics; keep interpretation focused on reuse percentage, chunk count, and coverage invariants.
## Batch Operations (v1.2)
Batch restore/remove/repair extends the automation contract with deterministic orchestration and invariant-aware reporting.
```bash
coldkeep restore 12 18 24 ./out
coldkeep remove 12 18 24
coldkeep remove --input ids.txt
coldkeep remove --stored-paths /data/a.txt /data/b.txt --input paths.txt
coldkeep repair ref-counts --batch
coldkeep repair --batch --input repair_targets.txt
coldkeep restore 12 18 ./out --dry-run
```
Current `repair --batch` scope is target-oriented, not item-oriented:
- today the only supported target is `ref-counts`
- input files for `repair --batch --input ` currently contain repeated target names such as `ref-counts`
- they do not contain file IDs or stored paths
Semantics (summary):
- per-item isolation by default
- optional fail-fast for execution failures
- duplicate target skipping
- deterministic per-item report ordering
- JSON status values are intentionally two-layered:
- overall payload status: ok, partial_failure, error
- per-item result status: success, failed, skipped, planned
- JSON execution mode is explicit: `continue_on_error` (default) or `fail_fast`
- process exit is automation-friendly:
- 0 when no item fails
- 1 when one or more items fail
- 2 for pre-execution validation/usage failures (including empty effective target sets after parsing input)
Example JSON payload:
```json
{
"status": "partial_failure",
"operation": "repair",
"dry_run": false,
"execution_mode": "continue_on_error",
"summary": {
"total": 2,
"succeeded": 1,
"failed": 1,
"skipped": 0
},
"results": [
{
"id": "ref-counts",
"status": "success",
"message": "logical_file ref_count values repaired"
},
{
"id": "ref-counts",
"status": "failed",
"message": "repair refused: orphan physical_file rows detected",
"invariant_code": "REPAIR_REFUSED_ORPHAN_ROWS",
"recommended_action": "Remove or correct orphan physical_file rows before retrying repair."
}
]
}
```
For full batch contract details and examples, see [ARCHITECTURE.md](ARCHITECTURE.md) and [PRE_RELEASE_CHECKLIST.md](PRE_RELEASE_CHECKLIST.md).
## Snapshot Layer (v1.4)
coldkeep snapshots capture an immutable, point-in-time view of your stored files.
Snapshots capture a complete, immutable view of the current system state.
Even when using `--from`, snapshots are always fully self-contained and do not depend on their parent.
Critical clarity:
- Snapshots are always self-contained.
- `--from` records lineage metadata for analysis only.
- `--from` does not create dependencies.
- A child snapshot restore never requires reading parent snapshot content.
### Creating snapshots
v1.4 flow example:
```bash
# Create initial snapshot
coldkeep snapshot create --id day1
# Modify files...
# Create snapshot with lineage
coldkeep snapshot create --id day2 --from day1
# Understand changes
coldkeep snapshot diff day1 day2 --summary
# Inspect snapshot reuse
coldkeep snapshot stats day2
# Visualize history
coldkeep snapshot list --tree
# Preview deletion
coldkeep snapshot delete day1 --dry-run
```
```bash
# Full snapshot (all physical_file entries)
coldkeep snapshot create
# Full snapshot with lineage metadata
coldkeep snapshot create --id day2 --from day1
# Partial snapshot (exact paths and/or directory prefixes)
coldkeep snapshot create docs/ report.txt --label release-2026-04
```
- `--id `: snapshot_id system identifier. This is the command target for `show`, `restore`, `stats`, `diff`, and `delete`.
- `--label `: optional user-facing metadata only. It is not an identifier and is never used for command targeting.
- `--from `: optional parent snapshot lineage metadata on create. This is informational only and does not create any parent-content dependency during create or restore.
`--from ` behavior:
- snapshot recorded as derived from parent
- does not create a dependency
- snapshot content is still built from current system state
- parent relationship is used for comparison and visualization only
Current lineage scope policy:
- `--from` is currently supported only for full snapshots.
- Parent snapshot referenced by `--from` must also be full.
- Filtered parent/child lineage for partial snapshots is intentionally rejected in this phase.
Snapshot command targeting contract:
- There is no `--snapshot` selector flag for snapshot subcommands.
- Pass snapshot_id positionally (for example: `coldkeep snapshot restore `).
### Listing and inspecting
```bash
coldkeep snapshot list
coldkeep snapshot list --type full --limit 10 --since 2026-01-01
coldkeep snapshot list --tree
coldkeep snapshot show snap-abc123
coldkeep snapshot show snap-abc123 --limit 50
coldkeep snapshot show snap-abc123 --prefix docs/
coldkeep snapshot show snap-abc123 --pattern "docs/*.txt" --min-size 1024
coldkeep snapshot stats
coldkeep snapshot stats snap-abc123
```
`snapshot list --tree` renders a lineage view from snapshot metadata (`id`, `parent_id`, `created_at`).
If a parent snapshot was deleted, affected snapshots are still shown as roots; snapshot usability is unchanged.
Lineage visualization is not a dependency graph for restore execution.
The snapshot tree represents lineage metadata, not dependency.
Conceptual lineage example:
```text
day1
└── day2
└── day3
```
Each snapshot is independent despite this structure.
`snapshot list --tree`:
- displays snapshots as a lineage tree based on parent relationships
- reflects metadata lineage only (not restore dependency)
`snapshot stats` lineage context:
- when a parent snapshot is available, stats include reused files, new files, and reuse ratio
- if the parent snapshot is missing, stats fall back gracefully with explanatory output
Snapshot file queries are reusable across `snapshot show`, `snapshot restore`, and `snapshot diff`.
Supported query flags:
- `--path `: exact normalized snapshot path match; repeatable
- `--prefix `: normalized directory prefix match; repeatable and must end with `/`
- `--pattern `: slash-path glob (`path.Match`) against the normalized snapshot path
- `--regex `: regular expression against the snapshot path
- `--min-size ` / `--max-size `: inclusive logical size range
- `--modified-after ` / `--modified-before `: inclusive mtime window
All active criteria are ANDed together. Path and prefix inputs are normalized before matching, and result ordering remains deterministic.
### Restoring from a snapshot
```bash
# Restore all files to their original paths
coldkeep snapshot restore snap-abc123
# Restore a subdirectory under a new prefix
coldkeep snapshot restore snap-abc123 docs/ --mode prefix --destination ./restored
# Restore a single file to an explicit destination
coldkeep snapshot restore snap-abc123 docs/report.txt --mode override --destination ./out/report.txt
# Restore only matching files from the snapshot query layer
coldkeep snapshot restore snap-abc123 --prefix docs/ --pattern "docs/*.txt" --mode prefix --destination ./restored
```
### Diffing two snapshots
`snapshot diff` compares two snapshots by path and logical file identity, classifying each change as `added`, `removed`, or `modified`.
When query filters include size or mtime constraints, diff evaluates `added` and `modified` entries against target-snapshot metadata, and `removed` entries against base-snapshot metadata.
A file is considered modified if its content changes, even when the path stays the same.
```bash
# Show all changes between two snapshots
coldkeep snapshot diff snap-1 snap-2
# Show only added files
coldkeep snapshot diff snap-1 snap-2 --filter added
# Restrict the diff view to a path subset
coldkeep snapshot diff snap-1 snap-2 --prefix docs/
# Return summary counts only (no per-entry list)
coldkeep snapshot diff snap-1 snap-2 --summary
# Combine diff classification with snapshot query filters
coldkeep snapshot diff snap-1 snap-2 --filter modified --regex "\\.yaml$"
# Machine-readable JSON output
coldkeep snapshot diff snap-1 snap-2 --output json
```
Text output example:
```text
[SNAPSHOT DIFF]
Base: snap-1
Target: snap-2
+ docs/new.txt
- docs/old.txt
~ docs/config.yaml
Summary:
added: 1
removed: 1
modified: 1
```
JSON output schema:
```json
{
"status": "ok",
"command": "snapshot diff",
"data": {
"base": "snap-1",
"target": "snap-2",
"summary": { "added": 1, "removed": 1, "modified": 1 },
"entries": [
{ "path": "docs/new.txt", "type": "added", "base_logical_id": null, "target_logical_id": 2 },
{ "path": "docs/old.txt", "type": "removed", "base_logical_id": 1, "target_logical_id": null },
{ "path": "docs/config.yaml","type": "modified", "base_logical_id": 3, "target_logical_id": 4 }
],
"duration_ms": 12
}
}
```
`--filter` limits output to one change type (`added`, `removed`, or `modified`). Summary counts reflect the filtered set.
`--summary` returns counts only and skips detailed `entries` output.
`snapshot diff --summary`:
- displays a summary of changes
- includes added, removed, and modified counts
The JSON contract for snapshot commands is unchanged. Query flags only reduce the returned `files` or `entries` collections and the derived counts; field names and envelope structure remain stable.
### Deleting a snapshot
```bash
coldkeep snapshot delete snap-abc123 --force
coldkeep snapshot delete snap-abc123 --dry-run
```
Deletes only the snapshot row and its `snapshot_file` entries. The underlying logical files and blocks are not affected.
Deleting a snapshot removes metadata only. Data remains retained when still referenced by other snapshots or current state.
`--dry-run` is read-only and reports impact details (lineage preview and file-count breakdown) without applying changes.
Dry-run impact describes metadata/reference effects and does not guarantee disk-space reclamation.
When both `--force` and `--dry-run` are passed, `--dry-run` takes precedence and the command remains read-only.
`snapshot delete --dry-run` preview includes:
- number of files referenced by the snapshot
- files unique to this snapshot
- files shared with other snapshots
- lineage impact
No data is modified in dry-run mode.
### Safe lineage workflow (v1.4)
Use this sequence when operating on parent/child snapshots:
```bash
# 1) Create baseline and child lineage metadata
coldkeep snapshot create --id day1
coldkeep snapshot create --id day2 --from day1
# 2) Review lineage and impact before delete
coldkeep snapshot list --tree
coldkeep snapshot delete day1 --dry-run
# 3) If approved, delete parent metadata
coldkeep snapshot delete day1 --force
# 4) Verify child remains independently restorable
coldkeep snapshot restore day2
```
Expected behavior:
- Deleting `day1` changes lineage metadata and future GC eligibility only.
- `day2` remains restorable because snapshots are self-contained.
- `snapshot list --tree` may re-root children after parent delete; restore behavior is unchanged.
### Snapshot release gate (operator quick checklist)
Before tagging a release, run the dedicated snapshot/retention contract gate in [PRE_RELEASE_CHECKLIST.md](PRE_RELEASE_CHECKLIST.md).
For the focused automated snapshot gate, run:
```bash
scripts/run_snapshot_release_gate.sh --count 1
```
Run the checklist step-by-step and in order. For the manual snapshot lifecycle gate, use a stable snapshot identifier (for example via `snapshot create --id pre-gc-gate`) and pass snapshot IDs positionally in `snapshot restore`, `snapshot diff`, and `snapshot delete`.
Manual lifecycle expected in the release gate:
- create snapshot
- remove current mapping
- confirm GC dry-run reports snapshot-retained logical files
- restore from snapshot
- diff two snapshots
- delete snapshot
- confirm GC eligibility changes only after delete
For the full release criteria, use the snapshot sign-off sections in [PRE_RELEASE_CHECKLIST.md](PRE_RELEASE_CHECKLIST.md):
- `15) Snapshot sign-off checklist (Phases 1-7)`
- `C. Test surface checklist`
- `D. Documentation / release checklist`
- `15) Verify snapshot / retention contract (manual gate)`
- `16) Final global sign-off`
When opening the release PR, use [`.github/pull_request_template.md`](.github/pull_request_template.md)
to keep impact and validation context explicit.
### Future Hardening Backlog (non-blocking)
- Add fuzz coverage for snapshot query combinations (`--regex`, `--pattern`, `--prefix`) to further harden parser+matcher edge cases.
- This is a future hardening task and is not part of the current release gate.
## Doctor (recommended health gate)
coldkeep doctor is the operator health gate:
- runs recovery first (corrective)
- then schema/version sanity checks
- then verification (standard by default; full/deep optional)
Doctor is intentionally corrective, not read-only.
```bash
coldkeep doctor
coldkeep doctor --full
coldkeep doctor --deep --output json
```
### Recovery strict mode
Startup recovery runs in strict mode by default.
```bash
COLDKEEP_STRICT_RECOVERY=true # default; fail-closed on ambiguous state
COLDKEEP_STRICT_RECOVERY=false # operator escape hatch for recovery investigation
```
Strict mode is the recommended production setting. Disabling it should be treated as a temporary operator escape hatch only, not a normal operating mode.
## Verification
Verification levels:
- standard: metadata integrity
- full: structural/container integrity
- deep: full content read + hash validation
```bash
coldkeep verify system --standard
coldkeep verify system --full
coldkeep verify system --deep
```
Verification checks are observational. In CLI flows, startup recovery may run before verification.
## Documentation Map
- Architecture and internals: [ARCHITECTURE.md](ARCHITECTURE.md)
- Guarantee mapping and evidence: [VALIDATION_MATRIX.md](VALIDATION_MATRIX.md)
- Contribution workflow: [CONTRIBUTING.md](CONTRIBUTING.md)
- Release readiness flow: [PRE_RELEASE_CHECKLIST.md](PRE_RELEASE_CHECKLIST.md)
- Security reporting and threat guidance: [SECURITY.md](SECURITY.md)
- Current-state path identity policy: [docs/PATH_IDENTITY.md](docs/PATH_IDENTITY.md)
- Benchmark infrastructure and baseline policy: [docs/benchmarking.md](docs/benchmarking.md)
- Frozen v1.9 benchmark baseline contract: [docs/internal/benchmark_baselines_v1_9.md](docs/internal/benchmark_baselines_v1_9.md)
- Milestone history: [CHANGELOG.md](CHANGELOG.md)
## Roadmap note (post-v1.9)
Current status:
- v1.2 physical mapping/repair and audited GC root gates are complete.
- v1.3/v1.4 snapshot-retention correctness and lifecycle clarity are complete.
- v1.5 chunker-evolution compatibility contract is complete.
- v1.6 read-only observability and exact GC simulation tooling are complete.
- v1.7 controlled-execution performance validation and release-readiness hardening are complete.
- v1.8 packed block abstraction, AES-GCM packed-block integration, and release hardening are complete.
- v1.9 transform-based storage semantics, block-level compression, and staged verification are complete.
Next focus is v1.10: architecture extraction on top of frozen v1.9 storage semantics.
## Contributing
Contributions and discussions are welcome.
See [CONTRIBUTING.md](CONTRIBUTING.md).
## License
Apache-2.0. See [LICENSE](LICENSE).