{"id":13761157,"url":"https://github.com/Gundolf68/tstdb","last_synced_at":"2025-05-10T12:31:03.093Z","repository":{"id":217230594,"uuid":"373792634","full_name":"Gundolf68/tstdb","owner":"Gundolf68","description":"An efficient Ternary Search Tree and persistent database for LuaJIT","archived":false,"fork":false,"pushed_at":"2021-06-11T08:44:38.000Z","size":152,"stargazers_count":18,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-16T18:34:21.767Z","etag":null,"topics":["database","lua","luajit","ternary-search-tree"],"latest_commit_sha":null,"homepage":"","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Gundolf68.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-04T09:38:23.000Z","updated_at":"2023-03-05T17:18:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"df5b3344-dc45-40ed-8399-f84a14582814","html_url":"https://github.com/Gundolf68/tstdb","commit_stats":null,"previous_names":["gundolf68/tstdb"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gundolf68%2Ftstdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gundolf68%2Ftstdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gundolf68%2Ftstdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gundolf68%2Ftstdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Gundolf68","download_url":"https://codeload.github.com/Gundolf68/tstdb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253415288,"owners_count":21904825,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","lua","luajit","ternary-search-tree"],"created_at":"2024-08-03T13:01:41.053Z","updated_at":"2025-05-10T12:31:02.786Z","avatar_url":"https://github.com/Gundolf68.png","language":"Lua","funding_links":[],"categories":["Lua"],"sub_categories":[],"readme":"# tstdb\nAn efficient Ternary Search Tree and persistent database for LuaJIT\n\nTernary search trees are a somewhat underrated data structure. This is due, among other things, to the often naïve implementation. Thus, in many cases the following structure is chosen for the nodes (in C): \n```C\ntypedef struct sNode Node;\nstruct sNode { char splitchar; char flag; Node *high; Node *low; Node *equal; };\n```\nOn a 64-bit system, each node has a size of 32 bytes. As we will see, this size can be easily halved. Also, in many cases, memory is allocated individually for each node during insertion, which is very inefficient. The solution to both problems is to use an array-based tree where the low/equal/high structure members represent array indices:\n```C\ntypedef struct { char splitchar; char flag; uint32_t high; uint32_t low; uint32_t equal; } Node;\n```\nThis reduces the node size to 16 bytes and we can allocate memory for the array of nodes in advance. With a uint32_t as index, 2^32 nodes can be addressed, which is a lot of memory (64GB), since Ternary Search Trees are very space efficient. A German dictionary with 356008 words and an average word length of 12 bytes requires 780953 nodes = 2.2 nodes per word (and German words can be very long: \"Telekommunikationsüberwachungsverordnung\". Because of the shared prefixes, if you add the plural \"Telekommunikationsüberwachungsverordnung**en**\" the word consumes only 2 new nodes).\n\n### Basic usage\n```Lua\n-- import \nlocal TSTDB = require(\"tstdb\")\n\n-- create an instance  \nlocal db = TSTDB()\n\n-- insert some keys\ndb.put(\"bananas\")\ndb.put(\"apples\")\ndb.put(\"cherries\")\n\n-- check if a key exists\nif db.get(\"apples\") then print(\"apples!\") end\n\n-- print all keys in sorted order\ndb.keys(function(key) print(key) end)\n\n-- shorter version:\ndb.keys(print)\n\n-- print all keys in descending order\ndb.keys(print, true)\n\n-- search for keys with a pattern\ndb.search(\"ba*\", function(key) print(key) end)\n\n-- search for keys with a more challenging pattern\ndb.search(\"*rr*s\", print)\n\n-- remove a key\ndb.remove(\"apples\")\n\n-- remove keys with a pattern\ndb.search(\"ba*\", db.remove)\n\n-- print the number of keys\nprint(db.key_count())\n\n-- print the number of nodes\nprint(db.node_count())\n\n-- reset the tree\ndb.clear()\n```\nThe put method returns a boolean value indicating whether the key was added (true) or already present (false). The same is true for the remove method.\n\n### Persistence\nTo make the tree persistent, a filename can be passed to the constructor:\n```Lua\nlocal db, err = TSTDB(\"fruits.db\")\nif not db then\n    -- your errorhandling here\n    print(err)\n    return\nend\n-- your code here\ndb.close()\n```\nWhen using a persistent tree, it is important to check the return value of the constructor, since various errors can occur when working with files (no write permissions, corrupt database files, etc.). If the file exists, the content is loaded, otherwise it is created. All changes (put, remove, optimize) are written to the file immediately. The database is fail-safe: even in the event of a power failure or program crash during a write operation, the database is automatically repaired at the next startup. The database format is human readable so that it can be edited with an text editor:\n```\nTSTDB\n7       bananas\n6       apples\n8       cherries\n-6      apples\n```\nA database file starts with the header \"TSTDB\" in the first line. Each entry starts with the length (in bytes) of the key, followed by a tab (ASCII 9) and the key. If the length is negative, the following key is removed.\n\n### Optimization\nTernary Search Trees are sensitive to the order of the inserted words: if you insert the keys in sorted order you end up with a long skinny tree. You can check the state of the tree with the state method:\n```Lua\nprint(db.state())\n```\nThis method returns a number between 0 (completely unbalanced) and 1 (completely balanced). The optimize method rebuilds the tree by reinserting all keys in random order:\n```Lua\ndb.optimize()\n```\nIt is also useful to call this method when you have removed many keys. Note that the number of nodes always remains the same, no matter in which order the keys are inserted.\nThe tree can be displayed using the dump method:\n```Lua\nlocal db = TSTDB()\ndb.put(\"banana\")\ndb.put(\"apples\")\ndb.put(\"bananas\")\ndb.dump()\n```\nOutput:\n```\nnode\tchar\tlow\tequal\thigh\tflag\n1\t'b'\t7\t2\t0\t0\n2\t'a'\t0\t3\t0\t0\n3\t'n'\t0\t4\t0\t0\n4\t'a'\t0\t5\t0\t0\n5\t'n'\t0\t6\t0\t0\n6\t'a'\t0\t13\t0\t1\n7\t'a'\t0\t8\t0\t0\n8\t'p'\t0\t9\t0\t0\n9\t'p'\t0\t10\t0\t0\n10\t'l'\t0\t11\t0\t0\n11\t'e'\t0\t12\t0\t0\n12\t's'\t0\t0\t0\t1\n13\t's'\t0\t0\t0\t1\n```\n### Use as database\nTernary Search Trees are underestimated mainly because they are usually only used as a sorted set. However, they can be used very efficiently as a (simple) database. Let's make a little example:\n```Lua\nlocal TSTDB = require(\"tstdb\")\n\nlocal db = TSTDB()\n-- insert the first user\ndb.put(\"/users/walter/\")\ndb.put(\"/users/walter/password/secret123\")\ndb.put(\"/users/walter/group/admin\")\ndb.put(\"/users/walter/hobbies/cooking\")\ndb.put(\"/users/walter/hobbies/counting money\")\ndb.put(\"/users/walter/friends/jesse\")\n-- insert a second user\ndb.put(\"/users/jesse/\")\ndb.put(\"/users/jesse/password/verysecret\")\ndb.put(\"/users/jesse/group/standard\")\ndb.put(\"/users/jesse/hobbies/sleeping\")\ndb.put(\"/users/jesse/hobbies/party\")\n```\nThe character '/' as path separator has (for now) no special meaning for the TST - you can use any char.\n\nNow some queries. Suppose a user wants to log in:\n```Lua\nif db.get(\"/users/\" .. name .. \"/password/\" .. password) then\n    print(\"login ok\")\nelse\n    print(\"login failed\")\nend\n```\nTo query all users, we use the search method, which takes a text pattern with one or more wildcards and a callback function as parameters:\n```Lua\ndb.search(\"/users/*/\", function(key) print(key) end)\n-- shorter:\ndb.search(\"/users/*/\", print)\n```\nWhich gives us all the results in alphabetical order:  \n```\n/users/jesse/  \n/users/walter/\n```\nIf only the username and not the whole key is needed, then the search method can be called with a third parameter that selects the segment. The default separator is '/' (so the slash has a special meaning after all - at least for the search method) but you can set the separator to any other char:\n```Lua\ndb.search(\"/users/*/\", print, 2)\n-- print the current separator\nprint(db.separator())\n-- set a dot as new separator\ndb.separator(\".\")\nprint(db.separator())\n```\nOutput:\n```\njesse  \nwalter\n/\n.\n```\nNext query: Count all users in the \"admin\" group:\n```Lua\nlocal count = 0\ndb.search(\"/users/*/group/admin\", function() count = count + 1 end)\nprint(count)\n```\nSearch all users who like to cook and are in the admin group:\n```Lua\ndb.search(\"/users/*/hobbies/cooking\", function(name) \n    if db.get(\"/users/\" .. name .. \"/group/admin\") then print(name) end \nend, 2)\n```\nIf the number of users that like to cook is much larger than the number of admins, then the other way around is better:\n```Lua\ndb.search(\"/users/*/group/admin\", function(name) \n    if db.get(\"/users/\" .. name .. \"/hobbies/cooking\") then print(name) end \nend, 2)\n```\nSearch all friends of Walter who have at least one hobby in common with him:\n```Lua\ndb.search(\"/users/walter/friends/*\", function(friend)\n    db.search(\"/users/\" .. friend .. \"/hobbies/*\", function(hobby)\n        if db.get(\"/users/walter/hobbies/\" .. hobby) then\n            print(friend .. \" -\u003e \" .. hobby)\n        end\n    end, 4) \nend, 4)\n```\n### Pitfalls\n#### Multiple wildcards\nBe careful when searching with multiple wildcards:\n```Lua\ndb.put(\"bananas\")\ndb.search(\"*an*s\", print)\n```\nThen we get the same key twice:\n```\nbananas\nbananas\n```\nThis is not an error, since every possible combination is traversed by the tree: b**an**ana**s** and ban**an**a**s**.\n#### Updating database entries\nSuppose we want to change jesse's group from \"standard\" to \"admin\" in the user database above, then the old key \"/users/jesse/group/standard\" must be removed, otherwise he will be a member of both the \"standard\" and \"admin\" groups at the same time (of course, this can also be intentional). \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGundolf68%2Ftstdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGundolf68%2Ftstdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGundolf68%2Ftstdb/lists"}