{"id":13503721,"url":"https://github.com/erikgrinaker/toydb","last_synced_at":"2025-04-09T01:22:42.519Z","repository":{"id":37623733,"uuid":"183929744","full_name":"erikgrinaker/toydb","owner":"erikgrinaker","description":"Distributed SQL database in Rust, written as an educational project","archived":false,"fork":false,"pushed_at":"2024-09-28T13:55:35.000Z","size":4491,"stargazers_count":6209,"open_issues_count":1,"forks_count":574,"subscribers_count":89,"default_branch":"master","last_synced_at":"2024-10-29T15:06:43.165Z","etag":null,"topics":["database","distributed","mvcc","raft","rust","sql"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/erikgrinaker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-28T16:02:59.000Z","updated_at":"2024-10-29T14:40:19.000Z","dependencies_parsed_at":"2024-04-13T09:49:00.628Z","dependency_job_id":"92f77e99-1163-49ff-ac07-91a808cdfb5c","html_url":"https://github.com/erikgrinaker/toydb","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikgrinaker%2Ftoydb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikgrinaker%2Ftoydb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikgrinaker%2Ftoydb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikgrinaker%2Ftoydb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/erikgrinaker","download_url":"https://codeload.github.com/erikgrinaker/toydb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247953862,"owners_count":21024102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","distributed","mvcc","raft","rust","sql"],"created_at":"2024-07-31T23:00:44.027Z","updated_at":"2025-04-09T01:22:37.485Z","avatar_url":"https://github.com/erikgrinaker.png","language":"Rust","funding_links":[],"categories":["Applications","Databases","Rust","数据库管理系统","sql","\u003ca name=\"Rust\"\u003e\u003c/a\u003eRust"],"sub_categories":["Database","网络服务_其他"],"readme":"# \u003ca\u003e\u003cimg src=\"./docs/images/toydb.svg\" height=\"40\" valign=\"top\" /\u003e\u003c/a\u003e toyDB\n\n[![CI](https://github.com/erikgrinaker/toydb/actions/workflows/ci.yml/badge.svg)](https://github.com/erikgrinaker/toydb/actions/workflows/ci.yml)\n\nDistributed SQL database in Rust, built from scratch as an educational project. Main features:\n\n* [Raft distributed consensus engine][raft] for linearizable state machine replication.\n\n* [ACID transaction engine][txn] with MVCC-based snapshot isolation.\n\n* [Pluggable storage engine][storage] with [BitCask][bitcask] and [in-memory][memory] backends.\n\n* [Iterator-based query engine][query] with [heuristic optimization][optimizer] and time-travel \n  support.\n\n* [SQL interface][sql] including joins, aggregates, and transactions.\n\nOriginally written to learn more about database internals, toyDB is intended to illustrate the basic\narchitecture and concepts of distributed SQL databases. It focuses on simplicity and \nunderstandability, and should be functional and correct. Other aspects like performance,\nscalability, and availability are explicit non-goals -- these are major sources of complexity in \nproduction-grade  databases, which obscure the basic underlying concepts. Shortcuts have been taken \nwhere possible.\n\n[raft]: https://github.com/erikgrinaker/toydb/blob/master/src/raft/mod.rs\n[txn]: https://github.com/erikgrinaker/toydb/blob/master/src/storage/mvcc.rs\n[storage]: https://github.com/erikgrinaker/toydb/blob/master/src/storage/engine.rs\n[bitcask]: https://github.com/erikgrinaker/toydb/blob/master/src/storage/bitcask.rs\n[memory]: https://github.com/erikgrinaker/toydb/blob/master/src/storage/memory.rs\n[query]: https://github.com/erikgrinaker/toydb/blob/master/src/sql/execution/execute.rs\n[optimizer]: https://github.com/erikgrinaker/toydb/blob/master/src/sql/planner/optimizer.rs\n[sql]: https://github.com/erikgrinaker/toydb/blob/master/src/sql/mod.rs\n\n## Documentation\n\n* [Architecture guide](docs/architecture.md): overview of toyDB's architecture and implementation.\n\n* [SQL examples](docs/examples.md): walkthrough of toyDB's SQL features.\n\n* [SQL reference](docs/sql.md): toyDB SQL reference documentation.\n\n* [References](docs/references.md): books and other materials used while building toyDB.\n\n## Usage\n\nWith a [Rust compiler](https://www.rust-lang.org/tools/install) installed, a local five-node \ncluster can be built and started as:\n\n```\n$ ./cluster/run.sh\nStarting 5 nodes on ports 9601-9605 with data under cluster/*/data/.\nTo connect to node 5, run: cargo run --release --bin toysql\n\ntoydb4 21:03:55 [INFO] Listening on [::1]:9604 (SQL) and [::1]:9704 (Raft)\ntoydb1 21:03:55 [INFO] Listening on [::1]:9601 (SQL) and [::1]:9701 (Raft)\ntoydb2 21:03:55 [INFO] Listening on [::1]:9602 (SQL) and [::1]:9702 (Raft)\ntoydb3 21:03:55 [INFO] Listening on [::1]:9603 (SQL) and [::1]:9703 (Raft)\ntoydb5 21:03:55 [INFO] Listening on [::1]:9605 (SQL) and [::1]:9705 (Raft)\ntoydb2 21:03:56 [INFO] Starting new election for term 1\n[...]\ntoydb2 21:03:56 [INFO] Won election for term 1, becoming leader\n```\n\nA command-line client can be built and used with node 5 on `localhost:9605`:\n\n```\n$ cargo run --release --bin toysql\nConnected to toyDB node n5. Enter !help for instructions.\ntoydb\u003e CREATE TABLE movies (id INTEGER PRIMARY KEY, title VARCHAR NOT NULL);\ntoydb\u003e INSERT INTO movies VALUES (1, 'Sicario'), (2, 'Stalker'), (3, 'Her');\ntoydb\u003e SELECT * FROM movies;\n1, 'Sicario'\n2, 'Stalker'\n3, 'Her'\n```\n\ntoyDB supports most common SQL features, including joins, aggregates, and transactions.\n\nBelow is an `EXPLAIN` query plan of a more complex query, fetching movies from studios that have\nreleased movies with an IMDb rating of 8 or more:\n\n```\ntoydb\u003e EXPLAIN SELECT m.title, g.name AS genre, s.name AS studio, m.rating\n  FROM movies m JOIN genres g ON m.genre_id = g.id,\n    studios s JOIN movies good ON good.studio_id = s.id AND good.rating \u003e= 8\n  WHERE m.studio_id = s.id\n  GROUP BY m.title, g.name, s.name, m.rating, m.released\n  ORDER BY m.rating DESC, m.released ASC, m.title ASC;\n\nRemap: m.title, genre, studio, m.rating (dropped: m.released)\n└─ Order: m.rating desc, m.released asc, m.title asc\n   └─ Projection: m.title, g.name as genre, s.name as studio, m.rating, m.released\n      └─ Aggregate: m.title, g.name, s.name, m.rating, m.released\n         └─ HashJoin: inner on m.studio_id = s.id\n            ├─ HashJoin: inner on m.genre_id = g.id\n            │  ├─ Scan: movies as m\n            │  └─ Scan: genres as g\n            └─ HashJoin: inner on s.id = good.studio_id\n               ├─ Scan: studios as s\n               └─ Scan: movies as good (good.rating \u003e 8 OR good.rating = 8)\n```\n\n## Architecture\n\ntoyDB's architecture is fairly typical for a distributed SQL database: a transactional\nkey/value store managed by a Raft cluster with a SQL query engine on top. See the\n[architecture guide](./docs/architecture.md) for more details.\n\n[![toyDB architecture](./docs/images/architecture.svg)](./docs/architecture.md)\n\n## Tests\n\ntoyDB mainly uses [Goldenscripts](https://github.com/erikgrinaker/goldenscript) for tests. These \nscript various scenarios, capture events and output, and later assert that the behavior remains the \nsame. See e.g.:\n\n* [Raft cluster tests](https://github.com/erikgrinaker/toydb/tree/master/src/raft/testscripts/node)\n* [MVCC transaction tests](https://github.com/erikgrinaker/toydb/tree/master/src/storage/testscripts/mvcc)\n* [SQL execution tests](https://github.com/erikgrinaker/toydb/tree/master/src/sql/testscripts)\n* [End-to-end tests](https://github.com/erikgrinaker/toydb/tree/master/tests/scripts)\n\nRun tests with `cargo test`, or have a look at the latest \n[CI run](https://github.com/erikgrinaker/toydb/actions/workflows/ci.yml).\n\n## Benchmarks\n\ntoyDB is not optimized for performance, but comes with a `workload` benchmark tool that can run \nvarious workloads against a toyDB cluster. For example:\n\n```sh\n# Start a 5-node toyDB cluster.\n$ ./cluster/run.sh\n[...]\n\n# Run a read-only benchmark via all 5 nodes.\n$ cargo run --release --bin workload read\nPreparing initial dataset... done (0.179s)\nSpawning 16 workers... done (0.006s)\nRunning workload read (rows=1000 size=64 batch=1)...\n\nTime   Progress     Txns      Rate       p50       p90       p99      pMax\n1.0s      13.1%    13085   13020/s     1.3ms     1.5ms     1.9ms     8.4ms\n2.0s      27.2%    27183   13524/s     1.3ms     1.5ms     1.8ms     8.4ms\n3.0s      41.3%    41301   13702/s     1.2ms     1.5ms     1.8ms     8.4ms\n4.0s      55.3%    55340   13769/s     1.2ms     1.5ms     1.8ms     8.4ms\n5.0s      70.0%    70015   13936/s     1.2ms     1.5ms     1.8ms     8.4ms\n6.0s      84.7%    84663   14047/s     1.2ms     1.4ms     1.8ms     8.4ms\n7.0s      99.6%    99571   14166/s     1.2ms     1.4ms     1.7ms     8.4ms\n7.1s     100.0%   100000   14163/s     1.2ms     1.4ms     1.7ms     8.4ms\n\nVerifying dataset... done (0.002s)\n```\n\nThe available workloads are:\n\n* `read`: single-row primary key lookups.\n* `write`: single-row inserts to sequential primary keys.\n* `bank`: bank transfers between various customers and accounts. To make things interesting, this\n  includes joins, secondary indexes, sorting, and conflicts.\n\nFor more information about workloads and parameters, run `cargo run --bin workload -- --help`.\n\nExample workload results are listed below. Write performance is pretty atrocious, due to fsyncs \nand a lack of write batching at the Raft level. Disabling fsyncs, or using the in-memory engine, \nsignificantly improves write performance.\n\n| Workload | BitCask     | BitCask w/o fsync | Memory      |\n|----------|-------------|-------------------|-------------|\n| `read`   | 14163 txn/s | 13941 txn/s       | 13949 txn/s |\n| `write`  | 35 txn/s    | 4719 txn/s        | 7781 txn/s  |\n| `bank`   | 21 txn/s    | 1120 txn/s        | 1346 txn/s  |\n\n## Debugging\n\n[VSCode](https://code.visualstudio.com) provides an intuitive environment for debugging toyDB.\nThe debug configuration is included under `.vscode/launch.json`, to use it:\n\n1. Install the [CodeLLDB](https://marketplace.visualstudio.com/items?itemName=vadimcn.vscode-lldb)\n   extension.\n\n2. Go to the \"Run and Debug\" tab and select e.g. \"Debug unit tests in library 'toydb'\".\n\n3. To debug the binary, select \"Debug executable 'toydb'\" under \"Run and Debug\".\n\n## Credits\n\ntoyDB logo is courtesy of [@jonasmerlin](https://github.com/jonasmerlin).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferikgrinaker%2Ftoydb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferikgrinaker%2Ftoydb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferikgrinaker%2Ftoydb/lists"}