An open API service indexing awesome lists of open source software.

https://github.com/dj258255/minidb

A relational database built from scratch in C to dissect how PostgreSQL/MySQL work — pages, buffer pool, heap, B+Tree, SQL parser/executor, WAL, transactions.
https://github.com/dj258255/minidb

b-tree c database database-internals dbms from-scratch learning-project sql storage-engine wal

Last synced: 3 days ago
JSON representation

A relational database built from scratch in C to dissect how PostgreSQL/MySQL work — pages, buffer pool, heap, B+Tree, SQL parser/executor, WAL, transactions.

Awesome Lists containing this project

README

          

# minidb

A small relational database written from scratch in C, built to dissect how
PostgreSQL and MySQL actually work inside. It goes from raw fixed-size pages all
the way up to running SQL: page storage, a buffer pool, a heap, a B+Tree index,
a hand-written SQL parser and executor, a write-ahead log, and transactions.

This is a learning project. The goal isn't to invent something new; it's to
reproduce the real structure accurately and understand it. Every layer is
covered by tests (182 checks across 13 suites).

![minidb REPL demo](docs/demo.svg)

## Quick start

```sh
make test # build and run the test suite
make repl # build the REPL
./build/minidb my.db # open (or create) a database and type SQL
```

A session:

```
minidb> CREATE TABLE users (id INT, name TEXT);
테이블 'users' 생성됨 (컬럼 2개)
(인덱스: id 컬럼)
minidb> INSERT INTO users VALUES (1, 'kim');
minidb> INSERT INTO users VALUES (2, 'lee');
minidb> SELECT * FROM users WHERE id = 2;
id | name
2 | lee
(1행, 인덱스 사용)
minidb> BEGIN;
minidb> DELETE FROM users WHERE id = 1;
minidb> ROLLBACK;
minidb> SELECT * FROM users;
id | name
1 | kim
2 | lee
(2행)
```

Each table is stored as its own pair of files and survives a restart (the schema
is persisted too, so no need to re-run `CREATE TABLE`) -- see the storage layout
below.

## What's inside

Built bottom-up; each layer sits on the one below it.

| Layer | What it does | Mirrors |
|---|---|---|
| `pager.c` | fixed-size 4KB pages <-> a single file (`page_id * PAGE_SIZE`) | SQLite pager, PG smgr |
| `page.c` | slotted page: pack variable-length rows into a page | PG/InnoDB page layout |
| `bufpool.c` | page cache with pin counts, dirty flags, LRU eviction | InnoDB buffer pool |
| `heap.c` | table = a collection of pages; rows addressed by `RID = (page, slot)` | PG heap |
| `sql.c` | hand-written lexer + recursive-descent parser (SQL -> AST) | every DB frontend |
| `db.c` | executor: tuple codec, multi-table catalog, joins (NLJ/index/hash), aggregates | pg_catalog, executor |
| `btree.c` | on-disk B+Tree index for O(log n) lookups, with node splits | InnoDB clustered index |
| `wal.c` | write-ahead log: durability and atomicity, with crash recovery | PG WAL / redo log |

### Storage layout

Like PostgreSQL (each relation is its own file, `relfilenode`), every table lives
in its own files, and a catalog file lists which tables exist:

```
mydb catalog -- table names + schemas (like pg_class)
mydb.users.tbl users rows (heap)
mydb.users.idx users PK index (B+Tree)
mydb.orders.tbl orders rows
mydb.orders.idx orders PK index
```

## SQL supported

```
CREATE TABLE ( INT|TEXT, ...)
INSERT INTO VALUES (, ...)
SELECT <* | item, ...> FROM [] [JOIN [] ON = ]...
[WHERE [AND ] [OR ...]]
[GROUP BY ] [ORDER BY [ASC|DESC]] [LIMIT ]
UPDATE SET = [WHERE ...]
DELETE FROM [WHERE ...]
BEGIN | COMMIT | ROLLBACK

is | COUNT(*) | COUNT|SUM|MIN|MAX|AVG()
is , is one of = != < > <= >=
is [.]
```

An `=`, `<`, `>`, `<=`, or `>=` on the first column (an `INT` primary key) uses
the B+Tree index -- `=` is an O(log n) point lookup, the others walk the linked
leaf chain as a range scan. `!=`, conditions on other columns, and compound
`AND` conditions fall back to a full scan -- the kind of choice a query planner
makes. `ORDER BY`/`LIMIT` and `GROUP BY`/aggregates take a materialize path
(collect, then sort / sort-group). `JOIN` is a recursive N-way join that picks a
method per level like an optimizer: index nested-loop (`btree_search`) when the
inner's primary key is the `ON` key, hash join (build a hash on the inner's join
column, then O(1) probe) for any other equi-join, else a plain nested-loop scan.
Transactions use a no-steal + force-at-commit policy across every table and roll
back both the heap and the index.

See `DESIGN.md` for the full layer map and build order.

## Scope (honest limitations)

Kept simple on purpose: the first column of each table is treated as a unique
integer primary key; `WHERE` is in disjunctive normal form (AND-groups joined by
OR, no parentheses); joins are INNER only, each `ON` is a single `=`, chained up
to 4 tables (aliases supported, so self-joins work); projection/aggregation and
`GROUP BY` are single-table and don't combine with `ORDER BY`; and there is no
isolation/concurrency (one transaction at a time). B+Tree deletion isn't
implemented (deleted rows are tombstoned in the heap, so a stale index entry is
harmless). These are noted in the code where they matter.

## License

MIT