Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spandanb/learndb-py
Learn database internals by implementing it from scratch.
https://github.com/spandanb/learndb-py
database
Last synced: about 1 month ago
JSON representation
Learn database internals by implementing it from scratch.
- Host: GitHub
- URL: https://github.com/spandanb/learndb-py
- Owner: spandanb
- License: other
- Created: 2021-05-18T17:45:55.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-08-13T20:28:59.000Z (over 1 year ago)
- Last Synced: 2024-08-01T22:50:35.494Z (5 months ago)
- Topics: database
- Language: Python
- Homepage:
- Size: 1.18 MB
- Stars: 1,260
- Watchers: 11
- Forks: 54
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - spandanb/learndb-py
README
# LearnDB
> What I Cannot Create, I Do Not Understand -Richard Feynman
In the spirit of Feynman's immortal words, the goal of this project is to better understand the internals of databases by
implementing a relational database management system (RDBMS) (sqlite clone) from scratch.This project was motivated by a desire to: 1) understand databases more deeply and 2) work on a fun project. These dual
goals led to a:
- relatively simple code base
- relatively complete RDBMS implementation
- written in pure python
- No build step
- zero configuration
- configuration can be overridenThis makes the learndb codebase great for tinkering with. But the product has some key limitations that means it
shouldn't be used as an actual storage solution.### Features
Learndb supports the following:
- it has a rich sql (learndb-sql) with support for `select, from, where, group by, having, limit, order by`
- custom lexer and parser built using [`lark`](https://github.com/lark-parser/lark)
- at a high-level, there is an engine that can accept some SQL statements. These statements expresses operations on a
database (a collection of tables which contain data)
- allows users/agents to connect to RDBMS in multiple ways:
- REPL
- importing python module
- passing a file of commands to the engine
- on-disk btree implementation as backing data structure### Limitations
- Very simplified [^1] implementation of floating point number arithmetic, e.g. compared to
[IEEE754](https://en.wikipedia.org/wiki/IEEE_754)).
- No support for common utility features, like wildcard column expansion, e.g. `select * ...`
- More [limitations](./docs/tutorial.md)## Getting Started: Tinkering and Beyond
- To get started with `learndb` first start with [`tutorial.md`](docs/tutorial.md).
- Then to understand the system at a deeper technical level read [`reference.md`](docs/reference.md).
This is essentially a complete reference manual directed at a user of the system. This outlines the operations and
capabilities of the system. It also describes what is (un)supported and undefined behavior.
- `Architecture.md`` - this provides a component level breakdown of the repo and the system## Hacking
### Install
- System requirements
- requires a linux/macos system, since it uses `fcntl` to get exclusive read access on database file
- python >= 3.9
- To install for development, i.e. src can be edited from without having to reinstall:
- `cd `
- create virtualenv: `python3 -m venv venv `
- activate venv: `source venv/bin/activate`
- install requirements: `python -m pip install -r requirements.txt`
- install `Learndb` in edit mode: `python3 -m pip install -e .`### Run REPL
```
source venv/bin/activate
python run_learndb.py repl
```### Run Tests
- Run all tests:
- `python -m pytest tests/*.py`- Run btree tests:
-`python -m pytest -s tests/btree_tests.py` # stdout
- `python -m pytest tests/btree_tests.py` # suppressed out- Run end-to-end tests:
`python -m pytest -s tests/e2e_tests.py`- Run end-to-end tests (employees):
`python -m pytest -s tests/e2e_tests_employees.py``python -m pytest -s tests/e2e_tests_employees.py -k test_equality_select`
- Run serde tests:
`... serde_tests.py`- Run language parser tests:
`... lang_tests.py`- Run specific test:
`python -m pytest tests.py -k test_name`- Clear pytest cache
`python -m pytest --cache-clear`## References consulted
- I started this project by following cstack's awesome [tutorial](https://cstack.github.io/db_tutorial/)
- Later I was primarily referencing: [SQLite Database System: Design and Implementation (1st ed)](https://books.google.com/books?id=9Z6IQQnX1JEC&source=gbs_similarbooks)
- Sqlite file format: [docs](https://www.sqlite.org/fileformat2.html)
- Postgres for how certain SQL statements are implemented and how their [documentation](https://www.postgresql.org/docs/11/index.html) is organized## Project Management
- immanent work/issues are tracked in `tasks.md`
- long-term ideas are tracked in `docs/future-work.md`[^1]: When evaluating the difference between two floats, e.g. `3.2 > 4.2`, I consider the condition True if the
difference between the two is some fixed delta. The accepted epsilon should scale with the magnitude of the number