https://github.com/themankindproject/keplordb

A columnar log engine optimized for high-throughput ingestion of structured, append-only, time-ordered events.
https://github.com/themankindproject/keplordb
append-only avx2 columnar-storage database db embeddable iot log-engine logging mmap rust simd telemetry time-series wal
Last synced: about 2 months ago
JSON representation
A columnar log engine optimized for high-throughput ingestion of structured, append-only, time-ordered events.
Host: GitHub
URL: https://github.com/themankindproject/keplordb
Owner: themankindproject
License: apache-2.0
Created: 2026-04-19T13:40:54.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-28T18:17:38.000Z (3 months ago)
Last Synced: 2026-04-28T20:19:37.382Z (3 months ago)
Topics: append-only, avx2, columnar-storage, database, db, embeddable, iot, log-engine, logging, mmap, rust, simd, telemetry, time-series, wal
Language: HTML
Homepage: https://keplordb.pages.dev/
Size: 970 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # KeplorDB

[![crates.io](https://img.shields.io/crates/v/keplordb.svg)](https://crates.io/crates/keplordb)

[![docs.rs](https://docs.rs/keplordb/badge.svg)](https://docs.rs/keplordb)

[![CI](https://github.com/themankindproject/keplordb/actions/workflows/ci.yml/badge.svg)](https://github.com/themankindproject/keplordb/actions)

[![License: Apache-2.0](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](./LICENSE)

A columnar append-only log engine written in Rust — purpose-built for high-throughput structured event ingestion. LLM observability, HTTP access logs, IoT telemetry, payment ledgers — any workload that's append-only and time-ordered.

## Why

Existing options are either too heavy (ClickHouse, TimescaleDB) or too general (SQLite, RocksDB). **KeplorDB is an embeddable library** with no server process, no SQL parser, and no background threads. Declare your schema with a derive macro, open the engine, append events, query columns.

## Install

```toml

[dependencies]

keplordb = "0.1"

```

Or via git (pre-crates.io):

```toml

[dependencies]

keplordb = { git = "https://github.com/themankindproject/keplordb" }

```

Requires Rust **1.82** or newer. The workspace ships two crates — `keplordb` (engine) and `keplordb-macros` (the `#[derive(Schema)]` proc-macro, re-exported from the main crate). One dep gets both.

## Quick Start — typed schema derive

```rust

use keplordb::{Engine, Schema};

#[derive(Schema)]

#[keplordb(id = 1)]

pub struct AiCall {

    #[dim(bloom, rollup)] pub user:   String,

    #[dim(rollup)]        pub org:    String,

    #[dim]                pub model:  String,

    #[counter]            pub input_tokens:  u32,

    #[counter]            pub output_tokens: u32,

    #[label]              pub region: String,

}

fn main() -> Result<(), keplordb::DbError> {

    // `default_config` prefills bloom_dim + rollup_dims from the schema.

    let engine: Engine<3, 2, 1> =

        Engine::open(AiCall::default_config("/tmp/logs".into()))?;

    // Fluent typed builder — no positional indexing.

    engine.append(

        &AiCall::new(ts_ns())

            .user("alice")

            .org("acme")

            .model("gpt-4o")

            .input_tokens(1_200)

            .output_tokens(850)

            .region("us-east")

            .metric(5_000_000)  // cost in nanodollars

            .status(200)

            .into_log_event(),

    )?;

    // Named filter — maps each setter to the right dim internally.

    let alice = engine.aggregate(

        &AiCall::filter().user("alice").into_filter(),

    )?;

    println!("alice events: {}, cost sum: {}", alice.event_count, alice.metric);

    engine.flush()?;

    Ok(())

}

fn ts_ns() -> i64 {

    std::time::SystemTime::now()

        .duration_since(std::time::UNIX_EPOCH)

        .unwrap()

        .as_nanos() as i64

}

```

The raw positional `LogEvent` + `QueryFilter` API is still available for dynamic / runtime-decided schemas — see [`examples/raw_positional.rs`](crates/keplordb/examples/raw_positional.rs).

## Features

### Ergonomics

- **Typed schema derive** — `#[derive(keplordb::Schema)]` generates typed event + filter builders. No positional `dims[0]` indexing anywhere in user code

- **Compile-time validation** — wrong counter type, two bloom dims, missing `schema_id`, or caps exceeded all rejected by the macro with spans pointing at your source

- **Schema ID guard** — segment headers record the active schema; opening a data directory with a mismatched `schema_id` fails fast with `DbError::Corrupt`

### Storage

- **Columnar segments** — queries touch only the columns they need; aggregations scan contiguous arrays via mmap

- **Zero-copy reads** — u32/u16 columns read directly from mmap'd segment files via `zerocopy`; no deserialization

- **Pre-decompressed i64 cache** — `ts_ns` + `metric` delta-decoded once at segment open and reused across readers

- **Intern table cache** — string → u16 resolve table decompressed once per segment via `OnceLock`; 170× faster filtered aggregate after warmup

### Indexing

- **Bloom filter skip** — per-segment bloom on the primary dimension; skip entire files on mismatch

- **Zone maps per-dimension** — min/max per 256-row chunk, built with AVX2 SIMD min/max during rotation; chunks that can't match a filter are skipped before any column access

- **Status bitmap index** — per-value compressed bitmaps for O(1) status lookups; full-scan fallback

### Throughput

- **AVX2 SIMD aggregation** — vectorized sum, count, filtered-sum using 256-bit registers with hardware prefetching and scalar fallback

- **Sharded WAL, batch-routed** — each `append_batch` claims one shard via `fetch_add`; N concurrent writers → N shards, zero contention in the common case

- **Rayon parallel scan** — cross-segment aggregates fan out across cores

- **`query_recent` global merge** — candidates sorted by `max_ts` descending, per-segment results pooled + merged by `ts_ns` descending, with early termination once the kth-best ts exceeds any remaining segment's `max_ts`

### Durability

- **CRC32-framed WAL** — every record checksummed; partial frames detected and recovered up to the last complete frame on replay

- **Crash-safe rotation** — three-phase `rename → write segment → unlink`; orphaned `*.wal.rotating` files replayed on next open

- **Tunable fsync** — batched per `wal_sync_interval` (default 64 events) or `wal_sync_bytes` (default 256 KB). Set to `1` each for zero-loss, or `u32::MAX` for best-effort

### Operations

- **Segment-level GC** — `engine.gc(cutoff)` drops segments whose `max_ts` is below the threshold. No compaction, no write amplification, no read pause

- **Lock-free reads** — segment manifest + tombstones behind `ArcSwap`; one atomic load per query

- **Embeddable** — `Engine::open()` in your Rust binary. No TCP, no SQL, no external service

## Performance

Measured with Criterion over **1 million events in 10 segments** on an Intel i5-1135G7 (4c/8t). Appends are WAL-durable; reads run against real on-disk column data via mmap + AVX2 SIMD.

### Write path

| Operation | Latency | Throughput |

|---|---|---|

| `append_batch` · 4096 events | **4.76 ms** | 860K ev/s |

| `append_batch` · 1024 events | 873 µs | 1.17M ev/s |

| concurrent · 8t × 1024 events | **1.24 ms** | **6.6M ev/s** |

| WAL memory-only | 352 µs | 2.9M ev/s |

| rotation · 1 shard · 1024 | 10.1 ms | compress+fsync |

### Read path

| Operation | Latency | Throughput |

|---|---|---|

| `aggregate` · no filter | **756 µs** | 1.3G ev/s |

| `aggregate` · user filter | **254 µs** | 3.9G ev/s |

| `aggregate` · time range | 274 µs | 3.7G ev/s |

| `aggregate` · user + time | **105 µs** | **9.5G ev/s** |

| `query_recent` · 100 | **37 µs** | — |

| `query_recent` · 1000 | 456 µs | — |

| `query_recent` · user · 100 | 70 µs | — |

### Rollup (in-memory)

| Operation | Latency |

|---|---|

| `query_rollups` · single user · day | 7.5 µs |

| `query_rollups` · all buckets · day | 24 µs |

Run `cargo bench --workspace` to reproduce on your hardware.

## Use cases

The typed schema derive maps to any append-only, time-ordered workload. Example schemas per domain:

### LLM observability

```rust

#[derive(keplordb::Schema)]

#[keplordb(id = 1)]

pub struct LlmCall {

    #[dim(bloom, rollup)] pub user:    String,

    #[dim(rollup)]        pub api_key: String,

    #[dim]                pub model:   String,

    #[dim]                pub provider: String,

    #[counter]            pub input_tokens:  u32,

    #[counter]            pub output_tokens: u32,

    #[counter]            pub cache_tokens:  u32,

    #[label]              pub region: String,

}

```

### HTTP access logs

```rust

#[derive(keplordb::Schema)]

#[keplordb(id = 2)]

pub struct HttpLog {

    #[dim(bloom)] pub client_ip: String,

    #[dim]        pub method:    String,

    #[dim]        pub route:     String,

    #[counter]    pub bytes:     u32,

}

```

### IoT telemetry

```rust

#[derive(keplordb::Schema)]

#[keplordb(id = 3)]

pub struct Reading {

    #[dim(bloom)] pub device_id: String,

    #[dim]        pub sensor:    String,

}

// metric = sensor reading (fixed-point)

```

### Payment ledger

```rust

#[derive(keplordb::Schema)]

#[keplordb(id = 4)]

pub struct Txn {

    #[dim(bloom, rollup)] pub merchant: String,

    #[dim(rollup)]        pub customer: String,

    #[dim]                pub currency: String,

    #[dim]                pub method:   String,

    #[counter]            pub items:    u32,

    #[counter]            pub tax:      u32,

}

// metric = amount in cents

```

See the full set under [`crates/keplordb/examples/`](crates/keplordb/examples/):

- [`typed_schema.rs`](crates/keplordb/examples/typed_schema.rs) — end-to-end typed API

- [`raw_positional.rs`](crates/keplordb/examples/raw_positional.rs) — dynamic schemas via the positional API

- [`concurrent_writes.rs`](crates/keplordb/examples/concurrent_writes.rs) — 8-thread producers with `Arc`

- [`pagination.rs`](crates/keplordb/examples/pagination.rs) — cursor-based pagination through `query_recent`

- [`crash_recovery.rs`](crates/keplordb/examples/crash_recovery.rs) — WAL replay on restart

## Architecture

```

Write path

──────────

append_batch() ──► fetch_add → shard N ──► WAL (CRC32 + fsync/interval)

                                              │

                                              ▼

                                    rotate (3-phase, crash-safe)

                                              │

                                              ▼

                                        Segment .kseg

Read path

─────────

query() ──► ArcSwap index load

              │

              ▼

       candidate segments sorted by max_ts desc

              │

              ▼

       per-segment scan (AVX2 SIMD + zone-map prune)

              │

              ▼

       pool → sort ts_ns desc → truncate(limit)

              │

              ▼

       intern resolve via OnceLock cache

```

### Segment file layout (`.kseg`)

```

┌──────────────────────────────┐

│  header            256 B     │  schema_id verified on open

├──────────────────────────────┤

│  i64 block         zstd+δ    │  delta-encoded ts_ns + metric

│  u32 cols          latency + counters

│  u16 cols          status + flags + dims + id + labels

├──────────────────────────────┤

│  bloom filter      128 B     │  primary dim skip

│  status bitmap     zstd      │  per-value compressed bitmaps

│  zone maps         raw       │  min/max per 256-row chunk × D

│  intern table      zstd      │  cached via OnceLock after first access

└──────────────────────────────┘

```

### `LogEvent` schema

`D` dims, `C` counters, `L` labels — chosen for you by `#[derive(Schema)]` based on field tags.

| Field | Type | Description |

|---|---|---|

| `id` | `String` | Unique event identifier. Interned per segment for fast point lookup |

| `ts_ns` | `i64` | Nanosecond timestamp. Sorted, binary-searchable |

| `metric` | `i64` | Primary signed metric — cost, duration, value |

| `counters[0..C]` | `u32` | Unsigned counters |

| `latency_ms` | `u32` | Primary latency |

| `latency_detail_ms` | `u32` | Secondary latency breakdown |

| `status` | `u16` | Status code. Bitmap-indexed |

| `flags` | `EventFlags` | 16 boolean bitflags (newtype) |

| `dims[0..D]` | `String` | Indexed, filterable dimensions. Interned per segment. Zone-mapped |

| `labels[0..L]` | `String` | Free-form string labels. Stored, not indexed |

Caps: `D ≤ 256`, `C ≤ 64`, `L ≤ 64` — enforced at compile time by the derive.

## API reference

```rust

// Lifecycle

let engine = Engine::open(config)?;

engine.flush()?;

// Write

engine.append(&event)?;

engine.append_batch(&events)?;

// Read

let events  = engine.query_recent(&filter, limit)?;

let totals  = engine.aggregate(&filter)?;

let rollups = engine.query_rollups(from_day, to_day, &dim_filters)?;

let event   = engine.get_event("event-id")?;

// Delete (tombstone) + GC

engine.delete_event("event-id")?;

let stats = engine.gc(cutoff_ts_ns)?;

```

## Durability

`Engine::append` / `append_batch` writes the event into the on-disk WAL and returns once the bytes hit the kernel's page cache. The WAL is **fsync'd in batches**. Two knobs on `EngineConfig`:

| Field | Default | Meaning |

|---|---|---|

| `wal_sync_interval` | `64` | fsync after this many events (per shard) |

| `wal_sync_bytes` | `262_144` | fsync when buffered bytes crosses this (256 KB) |

Consequence: on a hard crash, you can lose **up to one full sync interval per shard** — at defaults up to `wal_sync_interval × wal_shard_count` events. Three profiles:

- **Zero-loss** — `wal_sync_interval = 1`, `wal_sync_bytes = 1`. Every append fsyncs.

- **Balanced (default)** — 64 events / 256 KB.

- **Best-effort** — `wal_sync_interval = u32::MAX`, `wal_sync_bytes = u64::MAX`. fsync only at rotation.

On clean shutdown call `engine.flush()` to rotate the WAL into a segment.

## Production readiness

KeplorDB is a **pre-1.0 release** — API and on-disk format may change before 1.0. Before adopting for production, read [`PRODUCTION.md`](PRODUCTION.md) for the explicit list of known gaps (schema evolution, observability hooks, fuzzing, sanitizer CI, replication, backup/restore, size-based GC).

Short version: ship it today for embedded-in-process structured log storage where the operator owns both sides. Defer for externally-sourced input, regulated workloads, multi-node deployments, or anything needing per-row durability.

## Dependencies

| Crate | Purpose |

|---|---|

| `zstd` | Compression for the i64 block, intern table, status bitmap |

| `zerocopy` | Zero-copy column reads for u32/u16 columns |

| `memmap2` | Memory-mapped segment files |

| `thiserror` | Error type derivation |

| `rustc-hash` | FxHash for the bloom filter |

| `hashbrown` | Arena-backed string intern table |

| `mimalloc` | Global allocator |

| `rayon` | Parallel cross-segment aggregate |

| `crc32fast` | WAL record integrity checksums |

| `arc-swap` | Lock-free segment index + tombstone reads |

| `keplordb-macros` | `#[derive(Schema)]` proc-macro (re-exported) |

The proc-macro crate pulls `syn 2`, `quote`, `proc-macro2` as build-time dependencies only.

## License

[Apache-2.0](LICENSE)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/themankindproject/keplordb

Awesome Lists containing this project

README