{"id":17032925,"url":"https://github.com/paolorechia/steeldb","last_synced_at":"2025-04-12T11:52:41.502Z","repository":{"id":214105926,"uuid":"735690793","full_name":"paolorechia/steeldb","owner":"paolorechia","description":"A simple database built from scratch in Rust","archived":false,"fork":false,"pushed_at":"2024-08-07T21:39:34.000Z","size":105,"stargazers_count":61,"open_issues_count":0,"forks_count":9,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-26T06:34:36.999Z","etag":null,"topics":["database","from-scratch","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paolorechia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-25T20:53:10.000Z","updated_at":"2025-03-25T19:41:55.000Z","dependencies_parsed_at":"2024-01-10T07:52:39.866Z","dependency_job_id":"ad72764a-b54a-4097-b299-8f44fa3660f5","html_url":"https://github.com/paolorechia/steeldb","commit_stats":null,"previous_names":["paolorechia/steeldb"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paolorechia%2Fsteeldb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paolorechia%2Fsteeldb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paolorechia%2Fsteeldb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paolorechia%2Fsteeldb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paolorechia","download_url":"https://codeload.github.com/paolorechia/steeldb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248565034,"owners_count":21125414,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","from-scratch","rust"],"created_at":"2024-10-14T08:30:59.932Z","updated_at":"2025-04-12T11:52:41.481Z","avatar_url":"https://github.com/paolorechia.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# SteelDB\n\nThis is a study repository. This is mostly for personal use. Building a Database from scratch in Rust. Why not? :)\n\n**Source code**: https://github.com/paolorechia/steeldb/\n\n# Current version documentation: latest\n**Database**: https://docs.rs/steeldb/latest/steeldb\n\n**Parser**: https://docs.rs/steeldb-parser/latest/steeldb_parser/\n\n## Medium Articles\n**Iteration 1**: https://medium.com/@paolorechia/building-a-database-from-scratch-in-rust-part-1-6dfef2223673\n\n## Architecture\n\n### Embedded (Standalone Server)\n![image](https://github.com/paolorechia/steeldb/assets/5386983/198ad119-3231-44ea-97c5-0c4542a6e457)\n\n### Client-Server Crate Architecture / Dependencies\n![image](https://github.com/paolorechia/steeldb/assets/5386983/2ade0895-8d24-4129-a7fd-bad6da2f32d1)\n\n\n## How to use this version\n\nShould be as simple:\n\n```\ncargo add steeldb\ncargo run\n```\n\nThis should start a REPL:\n\n```\n------------------------------------------------\n|                                               |\n|   SteelDB                                     |\n|   version: 0.1.0                              |\n|                                               |\n------------------------------------------------\n\nType 'exit;' to leave this shell\nCurrent supported commands: [select]\n\n\u003e\u003e\n```\nThe only implemented clause is select, which selects columns of a previously constructed table.\nFor example:\n\n```\n\u003e\u003e select name, annual_salary;\n|---------------------------------|\n|    name   |    annual_salary    |\n|---------------------------------|\n| John Man  |       60000         |\n|   Lenon   |       200000        |\n|   Mary    |      3012000        |\n|---------------------------------|\n\u003e\u003e\n```\n\nCommands should always add with a `;`.\n\nIf you simply try the command above, you will instead see:\n\n```\n\u003e\u003e select name;\n\"Os { code: 2, kind: NotFound, message: \\\"No such file or directory\\\" }\"\n\n\n\u003c------------------ COMMAND FAILED ------------------\u003e\n\"TableNotFound\"\n\n\u003c----------------------------------------------------\u003e\n\n\u003e\u003e \n```\n\nThis is because the table must be pre-created. You can either create one using `cargo test` and copying it,\nor copying and pasting this into the file `.steeldb/data/test_table.columnar`: \n\n```txt\nTABLE COLUMNAR FORMAT HEADER\nField name: final_grade; Type: f32; Number of elements: 3\n4.0\n3.2\n5\nField name: name; Type: String; Number of elements: 3\nJohn Man\nLenon\nMary\nField name: annual_salary; Type: i32; Number of elements: 3\n60000\n200000\n3012000\n\n```\n\n### Columnar Format\nAs you can see, the table format is very naive and verbose. It stores data in ASCII.\nIt's not meant to be efficient and will probably be replaced in the future.\n\n\n# More info\n\n## Useful Links:\n1. https://cstack.github.io/db_tutorial/parts/part1.html\n2. https://www.sqlite.org/arch.html\n3. https://build-your-own.org/database/\n\n\n# Roadmap\nThis is not a binding roadmap.\n\n## What do we need to build a Database like SQLite from scratch?\n\n1. A REPL shell\n3. A tokenizer\n4. A parser\n5. A code generator\n6. A virtual machine that interprets the generated code\n7. A B+ Tree\n8. Pager\n9. OS Interface\n\n\n### For the first iteration: the bare bones (v0.1.0)\nWe can simplify some components in the first iteration, so we have first a working end-to-end system.\nWe can then tweak the individual components to have more capabilities.\n\nHere are some simplifications we can do for our first iteration:\n\n1. Support only a subset of the SQL Syntax, for instance, start with only insert and select operations.\n2. Do not implement a B+ tree in the first iteration. Instead, use a vector or list of structs. \n3. Do not handle Pager in first iteration.\n4. Keep the database persisted into a single file.\n5. Use a single statically defined table.\n\nOur roadmap for the first iteration might end up looking like this:\n\n1. A REPL shell [x]\n3. A tokenizer [x]\n4. A parser [x]\n5. Add support for the SELECT clause [x]\n6. A code generator [x]\n7. A virtual machine that interprets the generated code [x]\n8. A table struct that stores data in HashMap of Vectors [x]\n9. A hardcoded table for testing [x]\n10. Proper error propagation [x]\n11. Pretty printing of table in REPL [x]\n12. A serialization / desserialization method to write/read data from file [x]\n13. Clean up [x]\n    * Handle select columns properly in read method [x]\n    * Test that everything is working as expected [x]\n    * Tag v1.0 [x]\n\n\n### Second iteration: making it usable (v0.2.0)\n1. Add proper documentation to project [x]\n2. Adds proper logging strategy to the server [x]\n3. Split project into several crates [x]\n5. Add another API besides the REPL to query the database []\n   * This can be either a traditional TCP or a HTTP server. It should be as simple as possible, and just receive a string of the SQL command\n   * Make REPL support both backends: Standalone process or network server\n6. Add configuration file []\n7. Add create table command []\n8. Add drop table command []\n9. Add alter table command []\n10. Multiple tables query support (add FROM clause support) []\n11. Support filters (add basic WHERE clause support) []\n12. Update documentation []\n\n\n\n### Third iteration: making it scalable (v0.3.0)\n1. Handle concurrency: needs research on approaches to use []\n3. Implement B+ or similar algorithm.\n   * Ideally keep support for current columnar support []\n   * Support both in-memory and persistent\n4. Adds caching (or pagination) []\n5. Support for transactions []\n\n\n### Fourth iteration: making it useful (v0.4.0)\n1. Implement inner join []\n2. Implement left / right join []\n3. Implement outer join []\n4. Implement nested operations, including WHERE IN (SELECT) []\n4. Implement aggregations []\n\n\n\n### Fifth iteration: making it time-aware (v0.5.0)\n1. Implement advanced SQL features\n  * Window []\n  * Having []\n3. Add date and timestamp types []\n4. Implement more SQL functions []\n\n\n### Sixth iteration: making it complete (v0.6.0)\n1. Anything important missing in SQL standards\n\n\n\n### Seventh iteration: making it compatible (v0.7.0)\n1. Implement the Spark.SQL API in Rust\n\n\n### Eighth iteration: making it attractive (v0.8.0)\n1. Build a SDK in Rust to use it\n\n\n### Nineth iteration: making it snaky (v0.9.0)\n1. Build Python bindings for the Rust SDK\n\n\n### Tenth iteration: placeholder (v1.0.0)\nIf we reached this point, we actually have an impressive system :)\n\n\n# Knowledge base\n\n## How do we build each of these components in Rust?\n\n### REPL Shell\nThis is pobably the simplest component. We just need to implement a CLI shell that reads lines of input until the command end symbol is presented (like ';').\nIt then forwards the parsed string into the next layers of our system, and returns the result.\n\n\n### A Tokenizer and A Parser\nOne should be able to generate them using https://github.com/lalrpop/lalrpop.\nTutorial: http://lalrpop.github.io/lalrpop/tutorial/001_adding_lalrpop.html\n\n\n### A Code Generator\nThis should be somehow integrated into the parser. It will become clearer once the first parser is built, how this can be written (if possible, avoiding too much coupling).\n\n\n### Virtual Machine\nProbably the easiest way is to implement it as a stack, so it can handled nested commands.\nFor each parsed command, we push it to the stack, and the virtual machine pop it, executing it.\nThis works assuming we parse the commands in a depth-first way, e.g., the most inner command is always parsed first, and it's result is available to the next command execution.\nThis should happen naturally using an auto-generated lexer/parser with lalrpop, as long as the grammar is correctly defined.\n\nTo keep it simple, however, we should start not supporting nested commands. We can still have the stack in place and ready for extension in the future.\n\n### B+ Tree\nThe B+ Tree is wide-known algorithm, and is described in several places, for instance: https://en.wikipedia.org/wiki/B%2B_tree\nThis will be skipped until much later! The goal is to first get an usable system/product that supports concurrency and possibly exposes an (HTTP) API to query data.\nThis means things like transactions have to be possibly implemented first.  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaolorechia%2Fsteeldb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaolorechia%2Fsteeldb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaolorechia%2Fsteeldb/lists"}