{"id":50288442,"url":"https://github.com/themankindproject/keplordb","last_synced_at":"2026-05-28T04:03:29.150Z","repository":{"id":354485464,"uuid":"1215103977","full_name":"themankindproject/keplordb","owner":"themankindproject","description":"A columnar log engine optimized for high-throughput ingestion of structured, append-only, time-ordered events.","archived":false,"fork":false,"pushed_at":"2026-04-28T18:17:38.000Z","size":993,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-28T20:19:37.382Z","etag":null,"topics":["append-only","avx2","columnar-storage","database","db","embeddable","iot","log-engine","logging","mmap","rust","simd","telemetry","time-series","wal"],"latest_commit_sha":null,"homepage":"https://keplordb.pages.dev/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/themankindproject.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-19T13:40:54.000Z","updated_at":"2026-04-28T18:17:42.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/themankindproject/keplordb","commit_stats":null,"previous_names":["themankindproject/keplordb"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/themankindproject/keplordb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themankindproject%2Fkeplordb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themankindproject%2Fkeplordb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themankindproject%2Fkeplordb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themankindproject%2Fkeplordb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/themankindproject","download_url":"https://codeload.github.com/themankindproject/keplordb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themankindproject%2Fkeplordb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33593401,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["append-only","avx2","columnar-storage","database","db","embeddable","iot","log-engine","logging","mmap","rust","simd","telemetry","time-series","wal"],"created_at":"2026-05-28T04:03:27.036Z","updated_at":"2026-05-28T04:03:29.132Z","avatar_url":"https://github.com/themankindproject.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# KeplorDB\n\n[![crates.io](https://img.shields.io/crates/v/keplordb.svg)](https://crates.io/crates/keplordb)\n[![docs.rs](https://docs.rs/keplordb/badge.svg)](https://docs.rs/keplordb)\n[![CI](https://github.com/themankindproject/keplordb/actions/workflows/ci.yml/badge.svg)](https://github.com/themankindproject/keplordb/actions)\n[![License: Apache-2.0](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](./LICENSE)\n\nA columnar append-only log engine written in Rust — purpose-built for high-throughput structured event ingestion. LLM observability, HTTP access logs, IoT telemetry, payment ledgers — any workload that's append-only and time-ordered.\n\n## Why\n\nExisting options are either too heavy (ClickHouse, TimescaleDB) or too general (SQLite, RocksDB). **KeplorDB is an embeddable library** with no server process, no SQL parser, and no background threads. Declare your schema with a derive macro, open the engine, append events, query columns.\n\n## Install\n\n```toml\n[dependencies]\nkeplordb = \"0.1\"\n```\n\nOr via git (pre-crates.io):\n\n```toml\n[dependencies]\nkeplordb = { git = \"https://github.com/themankindproject/keplordb\" }\n```\n\nRequires Rust **1.82** or newer. The workspace ships two crates — `keplordb` (engine) and `keplordb-macros` (the `#[derive(Schema)]` proc-macro, re-exported from the main crate). One dep gets both.\n\n## Quick Start — typed schema derive\n\n```rust\nuse keplordb::{Engine, Schema};\n\n#[derive(Schema)]\n#[keplordb(id = 1)]\npub struct AiCall {\n    #[dim(bloom, rollup)] pub user:   String,\n    #[dim(rollup)]        pub org:    String,\n    #[dim]                pub model:  String,\n    #[counter]            pub input_tokens:  u32,\n    #[counter]            pub output_tokens: u32,\n    #[label]              pub region: String,\n}\n\nfn main() -\u003e Result\u003c(), keplordb::DbError\u003e {\n    // `default_config` prefills bloom_dim + rollup_dims from the schema.\n    let engine: Engine\u003c3, 2, 1\u003e =\n        Engine::open(AiCall::default_config(\"/tmp/logs\".into()))?;\n\n    // Fluent typed builder — no positional indexing.\n    engine.append(\n        \u0026AiCall::new(ts_ns())\n            .user(\"alice\")\n            .org(\"acme\")\n            .model(\"gpt-4o\")\n            .input_tokens(1_200)\n            .output_tokens(850)\n            .region(\"us-east\")\n            .metric(5_000_000)  // cost in nanodollars\n            .status(200)\n            .into_log_event(),\n    )?;\n\n    // Named filter — maps each setter to the right dim internally.\n    let alice = engine.aggregate(\n        \u0026AiCall::filter().user(\"alice\").into_filter(),\n    )?;\n    println!(\"alice events: {}, cost sum: {}\", alice.event_count, alice.metric);\n\n    engine.flush()?;\n    Ok(())\n}\n\nfn ts_ns() -\u003e i64 {\n    std::time::SystemTime::now()\n        .duration_since(std::time::UNIX_EPOCH)\n        .unwrap()\n        .as_nanos() as i64\n}\n```\n\nThe raw positional `LogEvent\u003cD, C, L\u003e` + `QueryFilter\u003cD\u003e` API is still available for dynamic / runtime-decided schemas — see [`examples/raw_positional.rs`](crates/keplordb/examples/raw_positional.rs).\n\n## Features\n\n### Ergonomics\n\n- **Typed schema derive** — `#[derive(keplordb::Schema)]` generates typed event + filter builders. No positional `dims[0]` indexing anywhere in user code\n- **Compile-time validation** — wrong counter type, two bloom dims, missing `schema_id`, or caps exceeded all rejected by the macro with spans pointing at your source\n- **Schema ID guard** — segment headers record the active schema; opening a data directory with a mismatched `schema_id` fails fast with `DbError::Corrupt`\n\n### Storage\n\n- **Columnar segments** — queries touch only the columns they need; aggregations scan contiguous arrays via mmap\n- **Zero-copy reads** — u32/u16 columns read directly from mmap'd segment files via `zerocopy`; no deserialization\n- **Pre-decompressed i64 cache** — `ts_ns` + `metric` delta-decoded once at segment open and reused across readers\n- **Intern table cache** — string → u16 resolve table decompressed once per segment via `OnceLock`; 170× faster filtered aggregate after warmup\n\n### Indexing\n\n- **Bloom filter skip** — per-segment bloom on the primary dimension; skip entire files on mismatch\n- **Zone maps per-dimension** — min/max per 256-row chunk, built with AVX2 SIMD min/max during rotation; chunks that can't match a filter are skipped before any column access\n- **Status bitmap index** — per-value compressed bitmaps for O(1) status lookups; full-scan fallback\n\n### Throughput\n\n- **AVX2 SIMD aggregation** — vectorized sum, count, filtered-sum using 256-bit registers with hardware prefetching and scalar fallback\n- **Sharded WAL, batch-routed** — each `append_batch` claims one shard via `fetch_add`; N concurrent writers → N shards, zero contention in the common case\n- **Rayon parallel scan** — cross-segment aggregates fan out across cores\n- **`query_recent` global merge** — candidates sorted by `max_ts` descending, per-segment results pooled + merged by `ts_ns` descending, with early termination once the kth-best ts exceeds any remaining segment's `max_ts`\n\n### Durability\n\n- **CRC32-framed WAL** — every record checksummed; partial frames detected and recovered up to the last complete frame on replay\n- **Crash-safe rotation** — three-phase `rename → write segment → unlink`; orphaned `*.wal.rotating` files replayed on next open\n- **Tunable fsync** — batched per `wal_sync_interval` (default 64 events) or `wal_sync_bytes` (default 256 KB). Set to `1` each for zero-loss, or `u32::MAX` for best-effort\n\n### Operations\n\n- **Segment-level GC** — `engine.gc(cutoff)` drops segments whose `max_ts` is below the threshold. No compaction, no write amplification, no read pause\n- **Lock-free reads** — segment manifest + tombstones behind `ArcSwap`; one atomic load per query\n- **Embeddable** — `Engine::open()` in your Rust binary. No TCP, no SQL, no external service\n\n## Performance\n\nMeasured with Criterion over **1 million events in 10 segments** on an Intel i5-1135G7 (4c/8t). Appends are WAL-durable; reads run against real on-disk column data via mmap + AVX2 SIMD.\n\n### Write path\n\n| Operation | Latency | Throughput |\n|---|---|---|\n| `append_batch` · 4096 events | **4.76 ms** | 860K ev/s |\n| `append_batch` · 1024 events | 873 µs | 1.17M ev/s |\n| concurrent · 8t × 1024 events | **1.24 ms** | **6.6M ev/s** |\n| WAL memory-only | 352 µs | 2.9M ev/s |\n| rotation · 1 shard · 1024 | 10.1 ms | compress+fsync |\n\n### Read path\n\n| Operation | Latency | Throughput |\n|---|---|---|\n| `aggregate` · no filter | **756 µs** | 1.3G ev/s |\n| `aggregate` · user filter | **254 µs** | 3.9G ev/s |\n| `aggregate` · time range | 274 µs | 3.7G ev/s |\n| `aggregate` · user + time | **105 µs** | **9.5G ev/s** |\n| `query_recent` · 100 | **37 µs** | — |\n| `query_recent` · 1000 | 456 µs | — |\n| `query_recent` · user · 100 | 70 µs | — |\n\n### Rollup (in-memory)\n\n| Operation | Latency |\n|---|---|\n| `query_rollups` · single user · day | 7.5 µs |\n| `query_rollups` · all buckets · day | 24 µs |\n\nRun `cargo bench --workspace` to reproduce on your hardware.\n\n## Use cases\n\nThe typed schema derive maps to any append-only, time-ordered workload. Example schemas per domain:\n\n### LLM observability\n\n```rust\n#[derive(keplordb::Schema)]\n#[keplordb(id = 1)]\npub struct LlmCall {\n    #[dim(bloom, rollup)] pub user:    String,\n    #[dim(rollup)]        pub api_key: String,\n    #[dim]                pub model:   String,\n    #[dim]                pub provider: String,\n    #[counter]            pub input_tokens:  u32,\n    #[counter]            pub output_tokens: u32,\n    #[counter]            pub cache_tokens:  u32,\n    #[label]              pub region: String,\n}\n```\n\n### HTTP access logs\n\n```rust\n#[derive(keplordb::Schema)]\n#[keplordb(id = 2)]\npub struct HttpLog {\n    #[dim(bloom)] pub client_ip: String,\n    #[dim]        pub method:    String,\n    #[dim]        pub route:     String,\n    #[counter]    pub bytes:     u32,\n}\n```\n\n### IoT telemetry\n\n```rust\n#[derive(keplordb::Schema)]\n#[keplordb(id = 3)]\npub struct Reading {\n    #[dim(bloom)] pub device_id: String,\n    #[dim]        pub sensor:    String,\n}\n// metric = sensor reading (fixed-point)\n```\n\n### Payment ledger\n\n```rust\n#[derive(keplordb::Schema)]\n#[keplordb(id = 4)]\npub struct Txn {\n    #[dim(bloom, rollup)] pub merchant: String,\n    #[dim(rollup)]        pub customer: String,\n    #[dim]                pub currency: String,\n    #[dim]                pub method:   String,\n    #[counter]            pub items:    u32,\n    #[counter]            pub tax:      u32,\n}\n// metric = amount in cents\n```\n\nSee the full set under [`crates/keplordb/examples/`](crates/keplordb/examples/):\n\n- [`typed_schema.rs`](crates/keplordb/examples/typed_schema.rs) — end-to-end typed API\n- [`raw_positional.rs`](crates/keplordb/examples/raw_positional.rs) — dynamic schemas via the positional API\n- [`concurrent_writes.rs`](crates/keplordb/examples/concurrent_writes.rs) — 8-thread producers with `Arc\u003cEngine\u003e`\n- [`pagination.rs`](crates/keplordb/examples/pagination.rs) — cursor-based pagination through `query_recent`\n- [`crash_recovery.rs`](crates/keplordb/examples/crash_recovery.rs) — WAL replay on restart\n\n## Architecture\n\n```\nWrite path\n──────────\nappend_batch() ──► fetch_add → shard N ──► WAL (CRC32 + fsync/interval)\n                                              │\n                                              ▼\n                                    rotate (3-phase, crash-safe)\n                                              │\n                                              ▼\n                                        Segment .kseg\n\nRead path\n─────────\nquery() ──► ArcSwap index load\n              │\n              ▼\n       candidate segments sorted by max_ts desc\n              │\n              ▼\n       per-segment scan (AVX2 SIMD + zone-map prune)\n              │\n              ▼\n       pool → sort ts_ns desc → truncate(limit)\n              │\n              ▼\n       intern resolve via OnceLock cache\n```\n\n### Segment file layout (`.kseg`)\n\n```\n┌──────────────────────────────┐\n│  header            256 B     │  schema_id verified on open\n├──────────────────────────────┤\n│  i64 block         zstd+δ    │  delta-encoded ts_ns + metric\n│  u32 cols          latency + counters\n│  u16 cols          status + flags + dims + id + labels\n├──────────────────────────────┤\n│  bloom filter      128 B     │  primary dim skip\n│  status bitmap     zstd      │  per-value compressed bitmaps\n│  zone maps         raw       │  min/max per 256-row chunk × D\n│  intern table      zstd      │  cached via OnceLock after first access\n└──────────────────────────────┘\n```\n\n### `LogEvent\u003cD, C, L\u003e` schema\n\n`D` dims, `C` counters, `L` labels — chosen for you by `#[derive(Schema)]` based on field tags.\n\n| Field | Type | Description |\n|---|---|---|\n| `id` | `String` | Unique event identifier. Interned per segment for fast point lookup |\n| `ts_ns` | `i64` | Nanosecond timestamp. Sorted, binary-searchable |\n| `metric` | `i64` | Primary signed metric — cost, duration, value |\n| `counters[0..C]` | `u32` | Unsigned counters |\n| `latency_ms` | `u32` | Primary latency |\n| `latency_detail_ms` | `u32` | Secondary latency breakdown |\n| `status` | `u16` | Status code. Bitmap-indexed |\n| `flags` | `EventFlags` | 16 boolean bitflags (newtype) |\n| `dims[0..D]` | `String` | Indexed, filterable dimensions. Interned per segment. Zone-mapped |\n| `labels[0..L]` | `String` | Free-form string labels. Stored, not indexed |\n\nCaps: `D ≤ 256`, `C ≤ 64`, `L ≤ 64` — enforced at compile time by the derive.\n\n## API reference\n\n```rust\n// Lifecycle\nlet engine = Engine::open(config)?;\nengine.flush()?;\n\n// Write\nengine.append(\u0026event)?;\nengine.append_batch(\u0026events)?;\n\n// Read\nlet events  = engine.query_recent(\u0026filter, limit)?;\nlet totals  = engine.aggregate(\u0026filter)?;\nlet rollups = engine.query_rollups(from_day, to_day, \u0026dim_filters)?;\nlet event   = engine.get_event(\"event-id\")?;\n\n// Delete (tombstone) + GC\nengine.delete_event(\"event-id\")?;\nlet stats = engine.gc(cutoff_ts_ns)?;\n```\n\n## Durability\n\n`Engine::append` / `append_batch` writes the event into the on-disk WAL and returns once the bytes hit the kernel's page cache. The WAL is **fsync'd in batches**. Two knobs on `EngineConfig`:\n\n| Field | Default | Meaning |\n|---|---|---|\n| `wal_sync_interval` | `64` | fsync after this many events (per shard) |\n| `wal_sync_bytes` | `262_144` | fsync when buffered bytes crosses this (256 KB) |\n\nConsequence: on a hard crash, you can lose **up to one full sync interval per shard** — at defaults up to `wal_sync_interval × wal_shard_count` events. Three profiles:\n\n- **Zero-loss** — `wal_sync_interval = 1`, `wal_sync_bytes = 1`. Every append fsyncs.\n- **Balanced (default)** — 64 events / 256 KB.\n- **Best-effort** — `wal_sync_interval = u32::MAX`, `wal_sync_bytes = u64::MAX`. fsync only at rotation.\n\nOn clean shutdown call `engine.flush()` to rotate the WAL into a segment.\n\n## Production readiness\n\nKeplorDB is a **pre-1.0 release** — API and on-disk format may change before 1.0. Before adopting for production, read [`PRODUCTION.md`](PRODUCTION.md) for the explicit list of known gaps (schema evolution, observability hooks, fuzzing, sanitizer CI, replication, backup/restore, size-based GC).\n\nShort version: ship it today for embedded-in-process structured log storage where the operator owns both sides. Defer for externally-sourced input, regulated workloads, multi-node deployments, or anything needing per-row durability.\n\n## Dependencies\n\n| Crate | Purpose |\n|---|---|\n| `zstd` | Compression for the i64 block, intern table, status bitmap |\n| `zerocopy` | Zero-copy column reads for u32/u16 columns |\n| `memmap2` | Memory-mapped segment files |\n| `thiserror` | Error type derivation |\n| `rustc-hash` | FxHash for the bloom filter |\n| `hashbrown` | Arena-backed string intern table |\n| `mimalloc` | Global allocator |\n| `rayon` | Parallel cross-segment aggregate |\n| `crc32fast` | WAL record integrity checksums |\n| `arc-swap` | Lock-free segment index + tombstone reads |\n| `keplordb-macros` | `#[derive(Schema)]` proc-macro (re-exported) |\n\nThe proc-macro crate pulls `syn 2`, `quote`, `proc-macro2` as build-time dependencies only.\n\n## License\n\n[Apache-2.0](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthemankindproject%2Fkeplordb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthemankindproject%2Fkeplordb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthemankindproject%2Fkeplordb/lists"}