{"id":15041121,"url":"https://github.com/vkayy/vkdb","last_synced_at":"2025-10-05T02:24:36.549Z","repository":{"id":273630474,"uuid":"896973068","full_name":"vkayy/vkdb","owner":"vkayy","description":"A time series database engine in C++.","archived":false,"fork":false,"pushed_at":"2025-02-19T23:56:45.000Z","size":6049,"stargazers_count":12,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-14T19:50:05.012Z","etag":null,"topics":["cpp","database","time-series"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vkayy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-01T18:51:02.000Z","updated_at":"2025-03-02T22:40:55.000Z","dependencies_parsed_at":"2025-01-22T03:24:12.036Z","dependency_job_id":"c98dc4e4-6e20-4fcd-b171-bbf83f4852b0","html_url":"https://github.com/vkayy/vkdb","commit_stats":null,"previous_names":["vkayy/vkdb"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vkayy/vkdb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vkayy%2Fvkdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vkayy%2Fvkdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vkayy%2Fvkdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vkayy%2Fvkdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vkayy","download_url":"https://codeload.github.com/vkayy/vkdb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vkayy%2Fvkdb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278399691,"owners_count":25980334,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","database","time-series"],"created_at":"2024-09-24T20:45:37.464Z","updated_at":"2025-10-05T02:24:36.514Z","avatar_url":"https://github.com/vkayy.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ca id=\"readme-top\"\u003e\u003c/a\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/images/vkdb-full-cropped.svg\" alt=\"logo\" width=\"250\" height=auto /\u003e\n  \u003cp\u003eA time series database engine built in C++ with minimal dependencies.\u003c/p\u003e\n  \u003ca href=\"https://cplusplus.com/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/C%2B%2B-%2300599C?style=flat-square\u0026logo=cplusplus\u0026logoColor=ffffff\" alt=\"C++\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/vkayy/vkdb/graphs/contributors\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/contributors/vkayy/vkdb?style=flat-square\" alt=\"contributors\" /\u003e\u003c/a\u003e\n  \u003ca href=\"\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/last-commit/vkayy/vkdb?style=flat-square\" alt=\"last update\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/vkayy/vkdb/issues/\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/issues/vkayy/vkdb?style=flat-square\" alt=\"open issues\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/vkayy/vkdb/blob/main/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/vkayy/vkdb?style=flat-square\" alt=\"license\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/vkayy/vkdb/stargazers\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/stars/vkayy/vkdb?style=flat-square\" alt=\"stars\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/vkayy/vkdb/network/members\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/forks/vkayy/vkdb?style=flat-square\" alt=\"forks\" /\u003e\u003c/a\u003e\n  \u003ch4\u003e\n    \u003ca href=\"https://vkayy.github.io/vkdb\"\u003eDocumentation\u003c/a\u003e\n  \u003c/h4\u003e\n\u003c/div\u003e\n\n\u003e [!NOTE]\n\u003e Development has slowed for now, as I'm a little busier with work and university!\n\n\u003e [!WARNING]\n\u003e vkdb is currently in the early stages of development and is not yet ready for daily use!\n\n**vkdb** is a hobbyist time series database engine built with a focus on simplicity and modernity. Motivated by unfamiliar architectures and endless optimisation opportunities, this project is far from commercial, and is defined by a pursuit of challenge.\n\n# Table of contents\n\n[Internals](#internals)\n- [Database engine](#database-engine)\n- [Query processing](#query-processing)\n\n[Running locally (not needed)](#running-locally-not-needed)\n- [Installation](#installation)\n- [Tests](#tests)\n- [Benchmarks](#benchmarks)\n- [Examples](#examples)\n\n[Usage](#usage)\n- [Setup](#setup)\n- [Interface](#interface)\n- [Table management](#table-management)\n- [General queries](#general-queries)\n- [Playground](#playground)\n- [Mock data](#mock-data)\n\n[Working with vq](#working-with-vq)\n- [Table management](#table-management-1)\n- [Data manipulation](#data-manipulation)\n- [Errors](#errors)\n- [EBNF](#ebnf)\n\n[License](#license)\n\n[Authors](#authors)\n\n[Credits](#credits)\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n## Internals\n\n### Database engine\n\n#### Architecture\n\nvkdb is built on log-structured merge (LSM) trees. In their simplest form, these have an in-memory layer and a disk layer, paired with a write-ahead log (WAL) for persistence of in-memory changes.\n\nWhen you instantiate a `vkdb::Database`, all of the prior in-memory information (in-memory layer, metadata, etc.) will be loaded in if the database already exists, and if not, a new one is set up. This persists on disk until you clear it via `vkdb::Database::clear`.\n\nIt's best to make all interactions via `vkdb::Database`, or the `vkdb::Table` type via `vkdb::Database::getTable`, unless you just want to play around with vq (more on the playground [here](#playground)).\n\n\u003e [!NOTE]\n\u003e Make sure the `$HOME` environment variable is set correctly, as all database files will be stored in `.vkdb` within your home directory. Only tamper with this directory if moving databases between machines!\n\n![database engine internals](docs/images/database-engine-internals.png)\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n#### Compaction\n\nThe LSM tree uses time-window compaction to efficiently organise and merge SSTables across different layers (C0-C7). Each layer has a specific time window size and maximum number of SSTables.\n\n| Layer | Time Window | Max. SSTables |\n|-------|------------|--------------|\n| C0 | Overlapping | 32 |\n| C1 | 1 day | 2,048 |\n| C2 | 1 week | 1,024 |\n| C3 | 1 month | 512 |\n| C4 | 3 months | 256 |\n| C5 | 6 months | 128 |\n| C6 | 1 year | 64 |\n| C7 | 3 years | 32 |\n\nWhen the memtable fills up, it's flushed to C0 as an SSTable. C0 acts as a buffer for the later layers, and when it exceeds its SSTable limit, all the SSTables are merged into C1 at once, with each SSTable spanning a 1-day window.\n\nWhen any other layer exceeds its SSTable limit, only its oldest, excess SSTables are merged with the next layer's SSTables based on the layer's time window. For example, if C1 has too many SSTables:\n1. The oldest SSTables from C1 are selected.\n2. Any overlapping SSTables in C2 are identified based on 1-week time windows.\n3. The selected SSTables are merged into new SSTables in C2.\n4. Original SSTables are removed after successful merge.\n\n![compaction internals](docs/images/compaction-internals.png)\n\nThis time-window compaction strategy enables:\n- Fast queries, as SSTables beyond C0 are disjoint and only intersecting ranges need to be scanned.\n- Efficient storage, as older data is consolidated into larger chunks whilst recent data stays granular.\n- Reduced write amplification, with C0 as a buffer and merges occurring on progressively larger time windows.\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Query processing\n\nLexing is done quite typically, with enumerated token types and line/column number stored for error messages. Initially, I directly executed queries as string streams, but that was a nightmare for robustness.\n\nIn terms of parsing, vq has been constructed to have an LL(1) grammar—this meant I could write a straightforward recursive descent parser for the language. This directly converts queries to an abstract syntax tree (AST) with `std::variant`.\n\nFinally, the interpreter makes quick use of the AST via the visitor pattern, built into C++ with `std::variant` (mentioned earlier) and `std::visit`. This ended up making the interpreter (and pretty-printer) very satisfying to write.\n\n![database engine internals](docs/images/query-processing-internals.png)\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n## Running locally (not needed)\n\n### Installation\n\nFirst, clone the project and `cd` into the directory.\n```\ngit clone https://github.com/vkayy/vkdb.git \u0026\u0026 cd vkdb\n```\nThen, simply run the build script.\n```\n./build.sh\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Tests\n\nYou can use the `-t` flag to run the tests.\n```\n./build.sh -t\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Benchmarks\n\nYou can also use the `-b` flag to run the benchmarks.\n```\n./build.sh -b\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Examples\n\nFinally, you can use the `-e` flag to run any of the examples.\n```\n./build.sh -e \u003cfilename\u003e\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n## Usage\n\n### Setup\nAdd this to your `CMakeLists.txt` file—it lets you use vkdb by fetching the most recent version into your project's build.\n\n```cmake\ninclude(FetchContent)\nFetchContent_Declare(\n    vkdb\n    GIT_REPOSITORY https://github.com/vkayy/vkdb.git\n    GIT_TAG        main\n)\nFetchContent_MakeAvailable(vkdb)\ntarget_link_libraries(${PROJECT_NAME} vkdb)\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Interface\nSimply include the database header, and you'll have access to the database API.\n\n```cpp\n#include \u003cvkdb/database.h\u003e\n\nint main()  {\n  vkdb::Database db{\"example-db\"};\n  db.createTable(\"example-table\");\n  // ...\n}\n```\n\n\u003e [!CAUTION]\n\u003e Do not instantiate multiple databases with the same name, nor a single database with the name `interpreter_default` (more on this database [here](#playground)). As these instances have in-memory components, this can cause unexpected behaviour if they (and they likely will) become out-of-sync.\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Table management\n\nYou can manipulate tables with the database API, both with methods or queries.\n\n```cpp\ndb.createTable(\"sensor_data\")\n  .addTagColumn(\"location\")\n  .addTagColumn(\"type\");\n\ndb.run(\"REMOVE TAGS type FROM sensor_data;\")\n```\n\n\u003e [!IMPORTANT]\n\u003e When a table has been populated, it can no longer have its tag columns modified unless you call `vkdb::Table::clear`.\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### General queries\n\nWith the database API, you can run queries via strings, files, and the REPL.\n\n```cpp\ntest_db\n  .run(\"CREATE TABLE temp TAGS tag1, tag2;\")\n  .runFile(std::filesystem::current_path() / \"../examples/vq_setup.vq\")\n  .runPrompt()\n  .clear();\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\nWith the table API, you can run queries via the query builder.\n\n```cpp\nauto sum{table_replay.query()\n  .whereTimestampBetween(0, 999)\n  .whereMetricIs(\"metric\")\n  .whereTagsContain({\"tag1\", \"value1\"})\n  .sum()\n};\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Playground\n\nYou can also play around with vq by running `vkdb::VQ::run...()`. This operates on a reserved database called `interpreter_default`.\n\n```cpp\n#include \u003cvkdb/vq.h\u003e\n\nint main() {\n  vkdb::VQ::runPrompt();\n}\n```\n\n| ![vq-playground.png](docs/images/vq-playground.png) | \n|:--:| \n| *The vq playground REPL.* |\n\nThis is generally for experimental purposes—there's not much to gain from it in practice besides having a playground.\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Mock data\n\nFeel free to use `vkdb::random\u003c\u003e`. Any arithmetic type (with no cv- or ref-qualifiers) can be passed in as a template argument, and you can optionally pass in a lower and upper bound (inclusive).\n\n```cpp\nauto random_int{vkdb::random\u003cint\u003e(-100'000, 100'000)};\nauto random_double{vkdb::random\u003cdouble\u003e(-10.0, 10.0)};\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n## Working with vq\n\n### Table management\n\nHere are some table management queries.\n\n```sql\nCREATE TABLE climate TAGS region, season;\n\nDROP TABLE devices;\n\nADD TAGS host, status TO servers;\n\nREMOVE TAGS host FROM servers;\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Data manipulation\n\nHere are some data manipulation queries.\n```sql\nSELECT DATA status FROM sensors ALL;\n\nSELECT AVG temperature FROM weather BETWEEN 1234 AND 1240 WHERE city=london, unit=celsius;\n\nPUT temperature 1234 23.5 INTO weather TAGS city=paris, unit=celsius;\n\nDELETE rainfall 1234 FROM weather TAGS city=tokyo, unit=millimetres;\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### Errors\n\nThere are two kinds of errors you can get—parse errors and runtime errors, occurring at the named points in time for self-explanatory reasons.\n\n| ![vq-errors.png](docs/images/vq-errors.png) | \n|:--:| \n| *A parse error and a runtime error in the REPL.* |\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n### EBNF\n\nHere's the EBNF grammar encapsulating vq.\n\n```bnf\n\u003cexpr\u003e ::= {\u003cquery\u003e \";\"}+\n\n\u003cquery\u003e ::= \u003cselect_query\u003e | \u003cput_query\u003e | \u003cdelete_query\u003e | \u003ccreate_query\u003e  | \u003cdrop_query\u003e | \u003cadd_query\u003e | \u003cremove_query\u003e | \u003ctables_query\u003e\n\n\u003cselect_query\u003e ::= \"SELECT\" \u003cselect_type\u003e \u003cmetric\u003e \"FROM\" \u003ctable_name\u003e \u003cselect_clause\u003e\n\n\u003cselect_type\u003e ::= \"DATA\" | \"AVG\" | \"SUM\" | \"COUNT\" | \"MIN\" | \"MAX\"\n\n\u003cselect_clause\u003e ::= \u003call_clause\u003e | \u003cbetween_clause\u003e | \u003cat_clause\u003e\n\n\u003call_clause\u003e ::= \"ALL\" {\u003cwhere_clause\u003e}?\n\n\u003cbetween_clause\u003e ::= \"BETWEEN\" \u003ctimestamp\u003e \"AND\" \u003ctimestamp\u003e {\u003cwhere_clause\u003e}?\n\n\u003cat_clause\u003e ::= \"AT\" \u003ctimestamp\u003e {\u003cwhere_clause\u003e}?\n\n\u003cwhere_clause\u003e ::= \"WHERE\" \u003ctag_list\u003e\n\n\u003cput_query\u003e ::= \"PUT\" \u003cmetric\u003e \u003ctimestamp\u003e \u003cvalue\u003e \"INTO\" \u003ctable_name\u003e {\"TAGS\" \u003ctag_list\u003e}?\n\n\u003cdelete_query\u003e ::= \"DELETE\" \u003cmetric\u003e \u003ctimestamp\u003e \"FROM\" \u003ctable_name\u003e {\"TAGS\" \u003ctag_list\u003e}?\n\n\u003ccreate_query\u003e ::= \"CREATE\" \"TABLE\" \u003ctable_name\u003e {\"TAGS\" \u003ctag_list\u003e}?\n\n\u003cdrop_query\u003e ::= \"DROP\" \"TABLE\" \u003ctable_name\u003e\n\n\u003cadd_query\u003e ::= \"ADD\" \"TAGS\" \u003ctag_columns\u003e \"TO\" \u003ctable_name\u003e\n\n\u003cremove_query\u003e ::= \"REMOVE\" \"TAGS\" \u003ctag_columns\u003e \"FROM\" \u003ctable_name\u003e\n\n\u003ctables_query\u003e ::= \"TABLES\"\n\n\u003ctag_list\u003e ::= \u003ctag\u003e {\",\" \u003ctag\u003e}*\n\n\u003ctag\u003e ::= \u003ctag_key\u003e \"=\" \u003ctag_value\u003e\n\n\u003ctag_columns\u003e ::= \u003ctag_key\u003e {\",\" \u003ctag_key\u003e}*\n\n\u003ctag_key\u003e ::= \u003cidentifier\u003e\n\n\u003ctag_value\u003e ::= \u003cidentifier\u003e\n\n\u003cmetric\u003e ::= \u003cidentifier\u003e\n\n\u003ctable_name\u003e ::= \u003cidentifier\u003e\n\n\u003ctimestamp\u003e ::= \u003cnumber\u003e\n\n\u003cvalue\u003e ::= \u003cnumber\u003e\n\n\u003cidentifier\u003e ::= \u003cchar\u003e {\u003cchar\u003e | \u003cdigit\u003e}*\n\n\u003cnumber\u003e ::= {\"-\"}? \u003cdigit\u003e+ {\".\" \u003cdigit\u003e+}?\n\n\u003cchar\u003e ::= \"A\" | ... | \"Z\" | \"a\" | ... | \"z\" | \"_\"\n\n\u003cdigit\u003e ::= \"0\" | \"1\" | ... | \"9\"\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n## Authors\n\n[Vinz Kakilala](https://linkedin.com/in/vinzkakilala) (me)\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n## License\n\nDistributed under the MIT License. See [LICENSE](https://github.com/vkayy/vkdb/blob/main/LICENSE) for more information.\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n\n## Credits\n\nUsed [MurmurHash3](https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) for the Bloom filters. Fast, uniform, and deterministic.\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#readme-top\"\u003eback to top\u003c/a\u003e\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvkayy%2Fvkdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvkayy%2Fvkdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvkayy%2Fvkdb/lists"}