{"id":20684587,"url":"https://github.com/jason0214/minisql","last_synced_at":"2025-04-22T13:23:54.341Z","repository":{"id":139645428,"uuid":"107049194","full_name":"Jason0214/MiniSQL","owner":"Jason0214","description":"a toy database","archived":false,"fork":false,"pushed_at":"2017-12-05T22:40:08.000Z","size":33806,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-03-29T15:34:24.966Z","etag":null,"topics":["database","interpreter","minisql"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jason0214.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-15T21:07:14.000Z","updated_at":"2023-12-10T03:43:54.000Z","dependencies_parsed_at":"2023-07-23T08:00:54.998Z","dependency_job_id":null,"html_url":"https://github.com/Jason0214/MiniSQL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jason0214%2FMiniSQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jason0214%2FMiniSQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jason0214%2FMiniSQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jason0214%2FMiniSQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jason0214","download_url":"https://codeload.github.com/Jason0214/MiniSQL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250246915,"owners_count":21398964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","interpreter","minisql"],"created_at":"2024-11-16T22:23:01.337Z","updated_at":"2025-04-22T13:23:54.330Z","avatar_url":"https://github.com/Jason0214.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MiniSQL\n\n- an update verison of repo [ADS-DBAwesomeGroup/MiniSQL](https://github.com/ADS-DBAwesomeGroup/MiniSQL)  \n    + alter ***Interpreter***(originally written in csharp) to make it platform independent\n    + rewrite ***IndexManager***\n    + add an abstract level ***RecordManager***\n    + rewrite ***API*** to make use of ***RecordManager***\n- **highlight**\n    + well separated interpreter and underlying layer\n    + LR(1) interpreter totally built by hand\n    + high extensiblity of table index methods\n    + high extensiblity of join methods\n- **weakness**\n    + too many pointers :) \n\n## Query Overview\n\n- get a query sql sentence and think about its relational algebra\n- execute query strictly follow relational algebra. \n- in doing relational algebra, create a temperary table in every step\n\n**e.g.**\n``` sql\nSELECT a1, a2 FROM  T1 join (select * from T2, T3) where t1.a1 \u003e 0;\n```\n\n![execute_graph](./image/relational_algebra_eg.png)\n\n## Storage On Disk\n- MiniSQL store a database (includes table data and meta data) on disk in a single file: \\*.db\n\n![db](./image/db_file.png)\n\n- Records in database are all sorted in the table by some attributes (primary key if exists, otherwise choose one from all the attributes). And every table has a primary index implemented in B plus tree.\n\n![table_structure](./image/table_structure.png)\n\n## Module Description\n\n### Interpreter\n    totally build by hand,\n- **Naive Lexer**\n    + define token\n    + build regex pattern for every token type\n    + iterate every pattern to find the longest match\n- **LR(1) Parser**\n    + [grammar](https://github.com/Jason0214/MiniSQL/blob/master/doc/SQL%20Parser%20DFA.md)\n    + [state graph of parsing sql](https://github.com/Jason0214/MiniSQL/blob/master/doc/SQL%20Parser%20DFA.pdf)\n    + states are all taged by enum, so very readable\n    + bad: too many function call, lose performance\n- **Executor**\n    + parse the AS tree\n    + make use of ***API*** module to communicate with underlying layer\n\n### API\n- separate ***Interpreter*** and low level modules\n- implementation of API call\n\n### IndexManager\n    IndexManager contains implementations of primary and second index\n- **Extensiblity**\n    + provide universal interface in ***IndexEntry.h***\n- **Index Methods**\n    - [x] B plus tree index\n    - [ ] hash index\n\n### RecordManager\n    RecordManager has two jobs, one is managing all the tables used in the query, the other one is providing multiple join method\n- **Relavent Data Structures**\n    + ***Table***\n        * ***Table*** class is an encapsulation of ***materialized table*** or ***temporary table*** and expose same interfaces to upper level module regardless which type of table it is\n        * hide the detail of handling either ***materialized table*** and ***temporary table***  \n    + ***materialized table*** \n        + tables in the database\n        + stored on disk\n        + considered as a collection of blocks\n    + ***temporary table*** \n        + tables created through the execution of a query\n        + stored in memory\n        + do not have idea of block\n- **Table Management**\n    + in the begin of every query execution, initialize an empty map\n    + if a temporary table created, insert (key=table_name, value = class table) into map\n    + other module may request for a table from ***RecordManager***, if requested table is not in the map, ***RecordManager*** will check it in ***Catalog*** to see whether it is a materialized table, if still not found, throw an error.\n- **Materialize Temporary Table**\n    + it happens that a temporary table goes too large to fit in memory\n    + then it would be materialized by ***RecordManager***, that is write it to disk\n    + since upper level module only manipulate data in class ***table***, so materialize a temporary table would not involve anything in upper level \n- ***Join***\n    + for different table, join operation should be done differently. For example, for two materialized tables, Block-Wise join is recommanded, while for two temporary tables, we should do Tuple-Wise join\n    + different join functions are provided by ***RecordManager***\n\n### CatalogManager\n    store and manage all the meta data in database such table name, attribute name, attribute number, attribute type and so on\n- catalog manager in this project is **shit**\n- I use B plus tree to index the meta, expect to have a higher performance, however, it makes the code very messy, *only god knows what I was doing then*.\n\n### BufferManager\n    the only module communicate with the disk, all the other module request a block of data from BufferMananger. \n    Since the disk has high access time, BufferManager also cached some blocks in the memory.\n- **Block Buffer**\n    + BufferManager cache blocks in a hash table\n    + offset in the \\*.db file of a block is the key to hash\n    + hash table is open address to solve hash collision\n- **LRU**\n    + when need to read a block into memory while the buffer is full, LRU algorithm is used to swap a block out.\n    + LRU is implemented in a double linked list, every node points to a block in the buffer. \n    + after an access to a block, linked list node points to that block is moved to the head of double linked list","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjason0214%2Fminisql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjason0214%2Fminisql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjason0214%2Fminisql/lists"}