{"id":15111481,"url":"https://github.com/lucasbn/sqlite-clone","last_synced_at":"2026-01-19T00:32:38.486Z","repository":{"id":248396374,"uuid":"828570456","full_name":"lucasbn/sqlite-clone","owner":"lucasbn","description":"Serverless SQL database stored in a single file","archived":false,"fork":false,"pushed_at":"2024-10-21T20:47:01.000Z","size":651,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-11T14:26:17.922Z","etag":null,"topics":["databases","sqlite"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucasbn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-14T14:48:08.000Z","updated_at":"2024-10-21T20:47:04.000Z","dependencies_parsed_at":"2024-10-22T19:32:54.331Z","dependency_job_id":null,"html_url":"https://github.com/lucasbn/sqlite-clone","commit_stats":{"total_commits":47,"total_committers":1,"mean_commits":47.0,"dds":0.0,"last_synced_commit":"c7772b4ba196e510a3c53f0fce4754f9a30067b4"},"previous_names":["lucasbn/sqlite-clone"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucasbn%2Fsqlite-clone","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucasbn%2Fsqlite-clone/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucasbn%2Fsqlite-clone/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucasbn%2Fsqlite-clone/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucasbn","download_url":"https://codeload.github.com/lucasbn/sqlite-clone/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247371300,"owners_count":20928176,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["databases","sqlite"],"created_at":"2024-09-26T00:20:26.483Z","updated_at":"2026-01-19T00:32:38.442Z","avatar_url":"https://github.com/lucasbn.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SQLite3 Clone - A Serverless Single File Database\n\nA simple SQL database engine written in Go, which reads and writes data on a\nsingle file in the SQLite3 [file format](https://www.sqlite.org/fileformat.html#storage_of_the_sql_database_schema).\n\nYou can run the database engine with the following command:\n\n```\n./sqlite sample.db\n```\n\nThis project is in it's early stages of development, which means that it does\nnot yet support many features.\n\nThe section on architecture below will go into more depth about the various\nlayers that make up the engine, but at present there are only implementations\nfor the `Machine`, `BTreeEngine` and `Pager` layers.\n\nThis means that you cannot actually pass arbitrary SQL - the `Parser` and\n`Generator` layers are mocked and return hard coded instructions.\n\n# Architecture\n\n### Birds Eye View\n\n![Birds Eye View](images/architecture-birds-eye-view.png)\n\nThe image above is a visual depiction of the `DatabaseEngine`, which is\nresponsible for glueing together independent layers and providing an interface\nfor executing SQL statements and `meta` commands (e.g `.dbinfo`).\n\nThere are 5 layers: `Pager`, `BTreeEngine`, `Machine`, `Generator` and `Parser`. Each\nof these layers is designed to be completely independent of other layers. A\nlayer is responsible for defining the interface by which it expects to make calls\nto another layer.\n\nThis means that layers are decoupled, and don't really on the implementation\ndetails of any other layers. The reasoning for this is so that layers can be\nindependently tested and so that the entire functionality of a layer resides\nin the same package (although they are allowed to have sub-packages) which helps\nto create a 'separation of concerns'.\n\nIn the future, I'd like to potentially create different implementations of some\nlayers and figure out a way to profile them (i.e compare CPU and memory usage).\n\nThere is a global package for `types` which contains an `Entry` which is a union\nof a `null` value, a `string` and a `int`. These are the types of values that can\nbe stored in the database, which is a detail that only the `BTreeEngine` really\nneeds to be concerned with.\n\nTherefore, we can initialise the `Machine` with a generic type `T` which represents\nthe 'output' type of the machine - the `Machine` can take any type here. However,\nthe `BTreeEngine` takes a generic type `T` that conforms to the following interface:\n\n```go\ntype resultTypeConstructor[T any] interface {\n    Number(uint64) T\n    Text(string) T\n    Null() T\n}\n```\n\nThis is the `BTreeEngine`'s way of saying 'I need you to provide me a type that\nknows how to construct a `T` of all of the values that I store'.\n\nThe intention here is to ensure that all layers are independent entities that don't\nrely on external packages - they are valid packages in isolation that fulfill their\nsingle responsiblity. I'm not confident that this is the right thing to do here,\nand I'm open for criticism/alternatives. \n\nI can think of some argument against this which is like: \"`Entry` is basically a \nprimitive type like `int` or `string` and so it is completely acceptable to import \nit everywhere and have packages depend on it.\n\nThe \"each layer needs to be independent and not rely on external packages\" goal / p\nattern is _not_ so that I can publish these as separate packages that other developers \ncan rely on - it's to help make the code more readable and maintainable for myself \nand anyone else that works on this project. This makes me wonder if the `resultTypeConstructor` \nactually helps with readability/maintainability, or if I'm just doing it to remove\nany external dependencies for packages. I have a feeling it might be the latter and\nI'll want to reconsider this change\n\n### Layer 1: Pager\n\nThe pager is responsible for efficiently reading and writing pages to/from a file,\nand is not concerned with things like:\n\n1. The contents of a page\n2. The format of the file\n3. The 'context' of the file (e.g a database)\n\nIt could be used as a means for interacting with _any_ page-based file, and\nconforms to the following interface:\n\n```go\ntype Pager interface {\n    Close() error\n    PageSize() uint64\n    ReservedSpace() uint64\n    GetPage(pageNum uint64) ([]byte, error)\n}\n```\n\nIt can be instantied with the following constructor:\n\n```go\nfunc NewPager(filepath string, config PagerConfig) (*Pager, error)\n```\n\nwhere the `config` contains the page size and the number of reserved bytes at the\nend of each page.\n\nThe pager has a cache, which is implemented as a hash map that relates page\nnumbers to a byte slice (which is the length of a page).\n\nThere's likely a huge number of optimisations that could be done here, and I'm\ndeliberately doing none of them now (other than the very simple cache described\nabove) because my primary aim is to get an end-to-end simple, extensible and\nfunctional system.\n\nCurrently, the pager takes a filepath and opens the file itself. I'd like to spend\nsome time thinking about what consequences this has on testing - would it be better\nto pass in a file that is opened elsewhere so that I could potentially mock a file\nin unit tests?\n\n### Layer 2: BTreeEngine\n\nThe BTreeEngine is responsible for providing an interface for callers to interact\nwith B-Tree structures encoded by the database file.\n\n```go\ntype BTreeEngine[T any] interface {\n    NewCursor(id uint64, rootPageNum uint64) (bool, error)\n    RewindCursor(id uint64) (bool, error)\n    AdvanceCursor(id uint64) (bool, error)\n    ReadColumn(id uint64, column uint64) (T, error)\n}\n```\n\nThis is done through 'cursors' which are pointers to a particular entry in a\nB-Tree.\n\nWhen a cursor is initialised with `NewCursor()` it points to the first byte on\nthe root page, and is set to point at the first B-Tree entry via the `RewindCursor()`\nfunction. `AdvanceCursor()` moves the cursor along to the next entry, and `ReadColumn()`\nreads a particular column in the entry that is currently being pointed to.\n\nThis is the only layer that knows about the SQLite3 file format, which means that\nthis is the only layer that would need to change if the file format were to change.\n\nIt can be instantied with the following constructor:\n\n```go\nfunc NewBTreeEngine[T any](pager pager, resultConstructor resultTypeConstructor[T]) (*BTreeEngine[T], error)\n```\n\nThe `resultTypeConstructor` has been discussed in detail above, but it is something\nwhich tells the `BTreeEngine` how to construct objects of type `T` out of (in this case)\n`int`s `string`s or `null`s - the `BTreeEngine` is responsible for specifying which\nprimitive types it stores and hence needs to create `T`s out of.\n\n### Layer 3: Machine\n\nThe `Machine` is a simple virtual machine that executes a set of bytecode instructions.\nThe bytecode instruction set is defined in the `instructions` package, and each\ninstruction must conform to the following interface:\n\n```go\ntype Instruction[T any] interface {\n\tExecute(s *state.MachineState[T], b common.BTreeEngine[T]) [][]T\n}\n```\n\nArguments are defined directly on implementations of this interface, and are used\nto update the machine state `s` and are able to access the database via `b` which is\nthe `BTreeEngine`.\n\nThe return type is `[][]T` which means an instruction can return multiple tuples of \ntype `T`.\n\nA machine is instantiated with the following constructor:\n\n```go\nfunc NewMachine[T any](config MachineConfig[T]) *Machine[T]\n```\n\nThe `config` contains a list of instructions and a pointer to a `BTreeEngine`.\nThe machine then initialises its `state` which stores the current address, the registers\nand whether or not the machine is halted.\n\nWhen `Run` is called on a machine, it will pick the instruction at the current address\nand call `Execute` on it. This will happen continuously in a loop until the machine\nis halted (by setting `halted` to `true` on the `state`).\n\nA `register` is a map from an `int` (register number) to a `T` which means that\na register stores individual entries from a tuple.\n\n### Layer 4: Bytecode Generator\n\nTODO\n\n### Layer 5: Parser\n\nTODO\n\n# Useful links\n\n- [SQLite3 File Format](https://www.sqlite.org/fileformat.html#storage_of_the_sql_database_schema)\n- [SQLite3 Opcodes](https://www.sqlite.org/opcode.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucasbn%2Fsqlite-clone","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucasbn%2Fsqlite-clone","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucasbn%2Fsqlite-clone/lists"}