{"id":51086997,"url":"https://github.com/abhishek08aug/litedb","last_synced_at":"2026-06-23T22:33:10.689Z","repository":{"id":365845601,"uuid":"1227506093","full_name":"abhishek08aug/litedb","owner":"abhishek08aug","description":"A database from scratch - WAL, LSM-Tree, B+ Tree, MVCC, Raft and consistent hashing implemented from first principles in Python and Java","archived":false,"fork":false,"pushed_at":"2026-06-19T05:50:44.000Z","size":322,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-19T07:26:40.850Z","etag":null,"topics":["b-plus-tree","consensus","consistent-hashing","database","distributed-systems","from-scratch","java","lsm-tree","mvcc","python","query-engine","raft","sql","storage-engine","write-ahead-log"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abhishek08aug.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-02T19:25:51.000Z","updated_at":"2026-06-19T05:50:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/abhishek08aug/litedb","commit_stats":null,"previous_names":["abhishek08aug/litedb"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/abhishek08aug/litedb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhishek08aug%2Flitedb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhishek08aug%2Flitedb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhishek08aug%2Flitedb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhishek08aug%2Flitedb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abhishek08aug","download_url":"https://codeload.github.com/abhishek08aug/litedb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhishek08aug%2Flitedb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34709804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["b-plus-tree","consensus","consistent-hashing","database","distributed-systems","from-scratch","java","lsm-tree","mvcc","python","query-engine","raft","sql","storage-engine","write-ahead-log"],"created_at":"2026-06-23T22:33:09.320Z","updated_at":"2026-06-23T22:33:10.685Z","avatar_url":"https://github.com/abhishek08aug.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LiteDB\n\n\u003e **A database engine built from first principles — in Python and Java.**\n\nLiteDB is a fully working key-value and SQL database implementing the core algorithms behind PostgreSQL, Cassandra, etcd, and RocksDB — storage, transactions, query processing, replication, consensus, and observability — with zero external dependencies. Each subsystem is documented end-to-end (concept → implementation).\n\n![Python](https://img.shields.io/badge/python-3.10%2B-blue)\n![Java](https://img.shields.io/badge/java-11%2B-blue)\n![License](https://img.shields.io/badge/license-MIT-green)\n![Tests](https://img.shields.io/badge/tests-passed-brightgreen)\n![Dependencies](https://img.shields.io/badge/dependencies-none-lightgrey)\n\n---\n\n**New here?** [ARCHITECTURE.md](ARCHITECTURE.md) is a layer-by-layer walkthrough of how it works\nand *why* each decision was made; [ROADMAP.md](ROADMAP.md) maps what's built vs. the path to a\nproduction-grade database.\n\n## What's implemented\n\n| Layer | What | Real-world analogue |\n|-------|------|---------------------|\n| Storage | WAL, MemTable, SSTable, LSM-Tree, B+ Tree | LevelDB, RocksDB, PostgreSQL |\n| Transactions | MVCC, snapshot isolation, VACUUM | PostgreSQL, MySQL InnoDB |\n| Query | SQL parser, query planner, executor | SQLite, DuckDB |\n| Distribution | Consistent hashing, async replication, Raft, gossip discovery | Cassandra, etcd, CockroachDB |\n| Operations | PBKDF2 auth, RBAC, connection pool, rate limiter | PgBouncer, ProxySQL |\n| Observability | Prometheus metrics, slow query log, distributed tracing | Prometheus, Jaeger |\n\n\u003e **Scope note.** The transactional engine (storage + SQL + MVCC) is a complete single-node\n\u003e database. On top of it — **in both Python and Java at parity** — runs an **integrated\n\u003e single-machine distributed cluster**: multiple instances that partition data into shards\n\u003e (consistent hashing), replicate each shard through its own Raft group (multi-raft), route\n\u003e requests to leaders (with a configurable replication factor), commit cross-shard transactions via\n\u003e 2PC, recover in-doubt 2PC after a coordinator *or* participant crash (prepared intents are\n\u003e replicated through Raft), and **add/remove nodes online** — shards (with their data) rebalance via\n\u003e Raft membership changes. Nodes **discover each other via gossip** from a single seed (SWIM/Cassandra\n\u003e style, not a static list), with locally-derived alive/suspect/dead liveness. The **control plane is\n\u003e its own Raft group** (a co-located placement driver, the TiKV PD model): membership decisions are\n\u003e committed to the PD log, and its leader runs a gossip **failure detector** that, when a node dies,\n\u003e **auto-re-replicates** its shards to restore the replication factor — with no operator action and\n\u003e surviving a PD-leader crash (so the control plane is not a SPOF). A live web dashboard\n\u003e shows cluster health, config, the consistent-hash ring, the shard→node placement matrix, the live\n\u003e gossip membership matrix, and one event feed per instance narrating its reasoning;\n\u003e you can kill/restart nodes, add/remove nodes, and watch failover + rebalancing. Launch it with\n\u003e `python dashboard.py` or `java com.litedb.cluster.Dashboard`. It is a real end-to-end integration\n\u003e on one machine; it is **not** hardened for the cross-machine failure matrix (range split/merge,\n\u003e snapshot install, parallel-commit, Jepsen-grade testing). See\n\u003e [ROADMAP.md](ROADMAP.md) for what is built vs. remains.\n\n---\n\n## Quick start\n\n### Python\n\n```bash\ngit clone https://github.com/abhishek08aug/litedb.git\ncd litedb/litedb-python\n\n# All 15 modules — WAL → MemTable → SSTable → LSM-Tree → Parser → Replication → Gossip\n#                   → Transactions → B-Tree → SQL → Sharding → Raft → Auth → Metrics\npython run_demo.py\n```\n\nExpected output:\n\n```\nResults: 14 passed, 0 failed\n```\n\nRun a single module directly:\n\n```bash\npython run_demo.py wal          # wal.py\npython run_demo.py btree        # btree.py\npython run_demo.py raft         # raft.py\npython transactions.py          # standalone\n```\n\nRun the TCP server:\n\n```bash\n# Primary on port 7379\npython server.py --port 7379 --data-dir ./data/primary\n\n# Replica on port 7380 (streams WAL from primary)\npython server.py --port 7380 --data-dir ./data/replica --replica-of localhost:7379\n\n# Connect with netcat\nnc localhost 7379\nSET name Alice\nGET name\nSCAN a z\nDELETE name\n```\n\n### Java\n\n```bash\ncd litedb/litedb-java\n\n# Compile (no Maven required)\njavac --release 11 -d target/classes \\\n  $(find src/main/java -name \"*.java\" | sort)\n\n# Run the full integration demo (all 13 modules)\njava -cp target/classes com.litedb.demo.RunDemo\n\n# Run a single module\njava -cp target/classes com.litedb.btree.BPlusTree\njava -cp target/classes com.litedb.raft.RaftNode\n```\n\nExpected output:\n\n```\n=== COMPILE OK ===\n  LiteDB — Full Integration Demo\n...\n  Total: 13 core database subsystems, ~3000 lines of Java.\n[LiteDB Demo Complete]\n```\n\n---\n\n## Repository layout\n\n```\nlitedb/                          ← repo root\n├── README.md                    ← you are here\n├── LICENSE                      ← MIT\n├── CONTRIBUTING.md\n├── CHANGELOG.md\n├── .gitignore\n│\n├── .github/\n│   ├── ISSUE_TEMPLATE/\n│   │   ├── bug_report.md\n│   │   └── feature_request.md\n│   └── PULL_REQUEST_TEMPLATE.md\n│\n├── docs/                        ← theory curriculum (10 modules, 18 articles)\n│   ├── README.md                ← docs index \u0026 concept→implementation map\n│   ├── 01-fundamentals/         ← what is a database, architecture, WAL\n│   ├── 02-acid/                 ← ACID properties, lock internals\n│   ├── 03-storage-engines/      ← LSM-Tree, B+ Tree, compaction\n│   ├── 04-indexing/             ← B+ Tree, Hash, Bloom Filter, Inverted\n│   ├── 05-query-processing/     ← tokenizer → AST → planner → executor\n│   ├── 06-mvcc/                 ← MVCC, snapshot isolation, VACUUM\n│   ├── 07-distributed-systems/  ← CAP theorem, eventual consistency, CRDTs\n│   ├── 08-sharding/             ← consistent hashing, virtual nodes\n│   ├── 09-replication/          ← sync/async replication, Raft\n│   └── 10-nosql/                ← NoSQL design patterns\n│\n├── litedb-python/               ← Python implementation (pure stdlib)\n│   ├── wal.py                    Basic: Write-Ahead Log\n│   ├── memtable.py               Basic: MemTable\n│   ├── sstable.py                Basic: SSTable + Bloom Filter\n│   ├── lsm_engine.py             Basic: Full LSM-Tree + Compaction\n│   ├── query_parser.py           Basic: SET/GET/DELETE/SCAN parser\n│   ├── server.py                 Basic: Multi-client TCP server\n│   ├── replication.py            Basic: Async WAL streaming\n│   ├── transactions.py           Advanced: MVCC + snapshot isolation\n│   ├── btree.py                  Advanced: B+ Tree storage engine\n│   ├── sql_parser.py             Advanced: SQL parser \u0026 executor\n│   ├── sharding.py               Advanced: Consistent hashing + vnodes\n│   ├── raft.py                   Advanced: Raft consensus\n│   ├── auth_pool.py              Advanced: Auth + RBAC + connection pool\n│   └── metrics.py                Advanced: Metrics + tracing + slow log\n│\n└── litedb-java/                 ← Java implementation (complete — 13 modules, ~3000 lines)\n```\n\n---\n\n## Implementation modules\n\n### Basic — `run_demo.py`\n\n| File | Concept | Key algorithm |\n|------|---------|---------------|\n| `wal.py` | Write-Ahead Log | Append-only log; replay on crash |\n| `memtable.py` | MemTable | Sorted dict; O(log n) writes |\n| `sstable.py` | SSTable + Bloom Filter | Binary search; probabilistic membership |\n| `lsm_engine.py` | LSM-Tree | WAL + MemTable + SSTable + Compaction |\n| `query_parser.py` | Command parser | Tokenize → parse → execute |\n| `server.py` | TCP server | Multi-client; pipelined protocol |\n| `replication.py` | Async replication | WAL streaming; primary/replica |\n| `gossip.py` | Gossip membership | SWIM/Cassandra-style; seed discovery; heartbeat liveness |\n\n### Advanced — run via `run_demo.py`\n\n| File | Concept | Key algorithm |\n|------|---------|---------------|\n| `transactions.py` | MVCC | Versioned writes; snapshot isolation; VACUUM |\n| `btree.py` | B+ Tree | Sorted pages; node splits; linked leaves |\n| `sql_parser.py` | SQL engine | Tokenizer → AST → planner → executor |\n| `sharding.py` | Consistent hashing | Hash ring; virtual nodes; rebalancing |\n| `raft.py` | Raft consensus | Leader election; log replication; majority commit |\n| `auth_pool.py` | Auth + pooling | PBKDF2; RBAC; pool; token bucket |\n| `metrics.py` | Observability | Counters/gauges/histograms; slow log; tracing |\n\n---\n\n## Architecture\n\n```\n┌──────────────────────────────────────────────────────────────────┐\n│                        LiteDB Architecture                        │\n│                                                                  │\n│  Client (TCP / Python API)                                       │\n│      │                                                           │\n│      ▼                                                           │\n│  Auth \u0026 RBAC · Rate Limiter                    (auth_pool)       │\n│      │                                                           │\n│      ▼                                                           │\n│  SQL Parser \u0026 Query Planner · Command Parser   (sql_parser,      │\n│                                                 query_parser)    │\n│      │                                                           │\n│      ▼                                                           │\n│  Transaction Manager — MVCC                    (transactions)    │\n│  Snapshot isolation · Write-write conflict · VACUUM              │\n│      │                                                           │\n│      ▼                                                           │\n│  ┌──────────────────────┐   ┌──────────────────────────────┐    │\n│  │  LSM-Tree            │   │  B+ Tree (btree)             │    │\n│  │  wal → memtable      │   │  Sorted pages · O(log n)     │    │\n│  │  → sstable → compact │   │  Range scans via linked leaves│   │\n│  └──────────────────────┘   └──────────────────────────────┘    │\n│      │                                                           │\n│      ▼                                                           │\n│  Sharding — Consistent Hashing + Virtual Nodes (sharding)        │\n│  Replication — Async WAL Streaming             (replication)     │\n│  Consensus — Raft Leader Election + Log Repl.  (raft)            │\n│      │                                                           │\n│      ▼                                                           │\n│  Prometheus Metrics · Slow Query Log · Tracing (metrics)         │\n│  Connection Pool                               (auth_pool)       │\n└──────────────────────────────────────────────────────────────────┘\n```\n\n---\n\n## Theory curriculum\n\nThe [`docs/`](./docs/) directory contains 10 modules and 18 deep-dive articles covering every concept implemented in the code:\n\n- **[Module 01 — Fundamentals](./docs/01-fundamentals/)** — architecture, WAL, buffer pool, isolation levels, locks\n- **[Module 02 — ACID](./docs/02-acid/)** — atomicity, consistency, durability, lock internals\n- **[Module 03 — Storage Engines](./docs/03-storage-engines/)** — LSM-Tree, B+ Tree, compaction, page layout\n- **[Module 04 — Indexing](./docs/04-indexing/)** — B+ Tree, Hash, Bloom Filter, Inverted, Composite\n- **[Module 05 — Query Processing](./docs/05-query-processing/)** — tokenizer, AST, planner, optimizer, executor\n- **[Module 06 — MVCC](./docs/06-mvcc/)** — versioned writes, snapshot isolation, VACUUM\n- **[Module 07 — Distributed Systems](./docs/07-distributed-systems/)** — CAP theorem, eventual consistency, CRDTs\n- **[Module 08 — Sharding](./docs/08-sharding/)** — range/hash partitioning, consistent hashing, virtual nodes\n- **[Module 09 — Replication](./docs/09-replication/)** — sync/async replication, consistency models, Raft\n- **[Module 10 — NoSQL Patterns](./docs/10-nosql/)** — denormalization, embedding, time-series, wide rows\n- **[Module 11 — Security: Auth, RBAC \u0026 Pooling](./docs/11-security/)** — PBKDF2 auth, RBAC, connection pooling, rate limiting\n- **[Module 12 — Metrics \u0026 Observability](./docs/12-observability/)** — counters/gauges/histograms, percentiles, slow query log, tracing\n\n→ **[Full docs index](./docs/README.md)**\n\n---\n\n## Concept → implementation map\n\n| Concept | Doc | Python | Java |\n|---------|-----|--------|------|\n| Write-Ahead Log | [Module 01](./docs/01-fundamentals/deep-dive-buffer-pool-and-wal.md) | `litedb-python/wal.py` | `com.litedb.wal.WriteAheadLog` |\n| MemTable | [Module 03](./docs/03-storage-engines/storage-engine-internals.md) | `litedb-python/memtable.py` | `com.litedb.memtable.MemTable` |\n| SSTable + Bloom Filter | [Module 03](./docs/03-storage-engines/storage-engine-internals.md) | `litedb-python/sstable.py` | `com.litedb.sstable.*` |\n| LSM-Tree + Compaction | [Module 03](./docs/03-storage-engines/storage-engine-internals.md) | `litedb-python/lsm_engine.py` | `com.litedb.lsm.LSMEngine` |\n| Command Parser | [Module 05](./docs/05-query-processing/query-processing-optimization.md) | `litedb-python/query_parser.py` | `com.litedb.query.QueryParser` |\n| TCP Server | [Module 05](./docs/05-query-processing/query-processing-optimization.md) | `litedb-python/server.py` | `com.litedb.server.LiteDBServer` |\n| Async Replication | [Module 09](./docs/09-replication/replication-consistency-models.md) | `litedb-python/replication.py` | `com.litedb.replication.ReplicationLog` |\n| MVCC Transactions | [Module 06](./docs/06-mvcc/mvcc-concurrency-control.md) | `litedb-python/transactions.py` | `com.litedb.txn.MVCCStore` |\n| B+ Tree | [Module 03](./docs/03-storage-engines/storage-engine-internals.md) | `litedb-python/btree.py` | `com.litedb.btree.BPlusTree` |\n| SQL Parser \u0026 Executor | [Module 05](./docs/05-query-processing/query-processing-optimization.md) | `litedb-python/sql_parser.py` | `com.litedb.sql.SQLParser` |\n| Consistent Hashing | [Module 08](./docs/08-sharding/sharding-partitioning.md) | `litedb-python/sharding.py` | `com.litedb.sharding.ConsistentHashRing` |\n| Raft Consensus | [Module 09](./docs/09-replication/replication-consistency-models.md) | `litedb-python/raft.py` | `com.litedb.raft.RaftNode` |\n| Auth + RBAC + Pool | [Module 11](./docs/11-security/auth-rbac-and-pooling.md) | `litedb-python/auth_pool.py` | `com.litedb.auth.AuthManager` |\n| Metrics + Tracing | [Module 12](./docs/12-observability/metrics-and-observability.md) | `litedb-python/metrics.py` | `com.litedb.metrics.MetricsRegistry` |\n\n---\n\n## Key concepts implemented\n\n| Concept | Where it's used in production |\n|---------|-------------------------------|\n| Write-Ahead Log | PostgreSQL, MySQL, SQLite |\n| LSM-Tree + Compaction | LevelDB, RocksDB, Cassandra, HBase |\n| Bloom Filter | RocksDB, Cassandra, BigTable |\n| B+ Tree | PostgreSQL, MySQL InnoDB, Oracle |\n| MVCC | PostgreSQL, MySQL InnoDB, CockroachDB |\n| Consistent Hashing | Cassandra, DynamoDB, Chord DHT |\n| Raft Consensus | etcd, CockroachDB, TiKV, Consul |\n| PBKDF2 Auth | Django, PostgreSQL, bcrypt family |\n| Token Bucket | AWS API Gateway, Nginx, Redis |\n| Prometheus Metrics | Kubernetes, Grafana stack |\n| Distributed Tracing | Jaeger, Zipkin, Datadog APM |\n\n---\n\n## Requirements\n\n- Python 3.10 or later\n- No external packages — pure stdlib only\n\n---\n\n## Contributing\n\nSee [CONTRIBUTING.md](./CONTRIBUTING.md). All contributions welcome — bug fixes, new modules, documentation improvements, and tests.\n\n---\n\n## License\n\n[MIT](./LICENSE)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhishek08aug%2Flitedb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabhishek08aug%2Flitedb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhishek08aug%2Flitedb/lists"}