{"id":16759552,"url":"https://github.com/cyrildever/treee","last_synced_at":"2025-03-21T23:31:59.890Z","repository":{"id":54442864,"uuid":"269047651","full_name":"cyrildever/treee","owner":"cyrildever","description":"Fast indexing engine for data identified by hashed id and stored in an immutable file","archived":false,"fork":false,"pushed_at":"2024-09-11T15:01:14.000Z","size":814,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-10-14T04:08:27.936Z","etag":null,"topics":["file","golang","hash","http-server","immutable","microservice","search-engine"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cyrildever.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-03T09:37:34.000Z","updated_at":"2024-09-11T15:06:09.000Z","dependencies_parsed_at":"2022-08-13T15:50:28.093Z","dependency_job_id":"9616b24d-4cf6-40aa-8a2f-d34520ba6834","html_url":"https://github.com/cyrildever/treee","commit_stats":null,"previous_names":[],"tags_count":43,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyrildever%2Ftreee","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyrildever%2Ftreee/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyrildever%2Ftreee/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyrildever%2Ftreee/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cyrildever","download_url":"https://codeload.github.com/cyrildever/treee/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221820552,"owners_count":16886202,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["file","golang","hash","http-server","immutable","microservice","search-engine"],"created_at":"2024-10-13T04:08:26.205Z","updated_at":"2025-03-21T23:31:59.883Z","avatar_url":"https://github.com/cyrildever.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# treee\n_Fast indexing engine for data identified by hashed id and stored in an immutable file_\n\n![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/cyrildever/treee)\n![GitHub last commit](https://img.shields.io/github/last-commit/cyrildever/treee)\n![GitHub issues](https://img.shields.io/github/issues/cyrildever/treee)\n![GitHub](https://img.shields.io/github/license/cyrildever/treee)\n\nThis is the Go implementation of the Treee\u0026trade; indexing engine.\nIt's both a [library](#usage) to use as a module in your Go projects and an [executable](#executable) to run as a micro-service over HTTP.\n\n\n### Motivation\n\nThe challenge was to set up a powerful and safe search engine to use when the data is some linked list of items that could be themselves connected to each other in subchains, indexed through their identifiers that are only made of hashed values (like SHA-256 string representations), and all stored in an immutable file.\n\nIts best application is for a Blockchain file where an item is a transaction embedding a smart contract, and each subchain of items the subsequent uses and/or modifications of this smart contract.\nAs such, the Treee\u0026trade; index is currently used in the [Rooot\u0026trade;](https://rooot.io) blockchain.\n\n\n### Formal description\n\nWe define here an algorithm for indexing the system based on identifiers that are hashed values which is at the same time very powerful to the writing and the reading.\n\n#### 1) How the index works\n\nThe Treee\u0026trade; index is constructed as an acyclic graph (a tree). Each node contains either the node address (its sons) or a set of *Leafs*, a *Leaf* corresponding to the information helping to retrieve one or more linked items.\n\nThe number of sons of a node is deterministic and depends on the depth of the tree. We denote by \u003cimg src=\"https://latex.codecogs.com/gif.latex?p_{1}\" /\u003e the number of sons of the nodes at depth 1, \u003cimg src=\"https://latex.codecogs.com/gif.latex?p_{2}\" /\u003e the number of sons of the nodes at depth 2, ..., \u003cimg src=\"https://latex.codecogs.com/gif.latex?p_{k}\" /\u003e the number of sons at depth \u003cimg src=\"https://latex.codecogs.com/gif.latex?k\" /\u003e.\n\nThe goal is to create a balanced tree whose width is adaptive to decrease depth and optimize performance. We are looking to index numbers, in this case the numerical value of the items' unique identifiers.\n\nLet's explain the course of the index.\n\nFor the binary tree, we write the number in binary form (for example, \u003cimg src=\"https://latex.codecogs.com/gif.latex?4\\text{\u0026space;=\u0026space;}\" /\u003e `100`) which indicates its position in the tree.\n\nAt the step \u003cimg src=\"https://latex.codecogs.com/gif.latex?i\" /\u003e, we pass to the child \u003cimg src=\"https://latex.codecogs.com/gif.latex?0\" /\u003e if the \u003cimg src=\"https://latex.codecogs.com/gif.latex?i^{th}\" /\u003e bit is `0`, otherwise we pass to the child \u003cimg src=\"https://latex.codecogs.com/gif.latex?1\" /\u003e if the \u003cimg src=\"https://latex.codecogs.com/gif.latex?i^{th}\" /\u003e bit is `1`. We stop when the node is a *Leaf*.\n\nFor the tree, we build a representation of this number by a sequence of numbers and we traverse the tree in the same way. At the step \u003cimg src=\"https://latex.codecogs.com/gif.latex?i\" /\u003e of depth \u003cimg src=\"https://latex.codecogs.com/gif.latex?k\" /\u003e, we pass to the child \u003cimg src=\"https://latex.codecogs.com/gif.latex?0\" /\u003e if the \u003cimg src=\"https://latex.codecogs.com/gif.latex?i^{th}\" /\u003e representative is `0`, we pass to the child \u003cimg src=\"https://latex.codecogs.com/gif.latex?1\" /\u003e if the \u003cimg src=\"https://latex.codecogs.com/gif.latex?i^{th}\" /\u003e representative is `1`, ..., we pass to the child \u003cimg src=\"https://latex.codecogs.com/gif.latex?p_{k_i}\u0026space;-\u0026space;1\" /\u003e if the \u003cimg src=\"https://latex.codecogs.com/gif.latex?i^{th}\" /\u003e representative is \u003cimg src=\"https://latex.codecogs.com/gif.latex?p_{k_i}\u0026space;-\u0026space;1\" /\u003e. We stop when the node is a *Leaf*.\n\nTo construct the representation of a number, we will successively take the modulo of prime numbers. According to the theorem of the Chinese remains, each number has a unique representative that could be written as the continuation of these modulos. Indeed, a number \u003cimg src=\"https://latex.codecogs.com/gif.latex?n\" /\u003e can be written in the following form: \u003cimg src=\"https://latex.codecogs.com/gif.latex?n\u0026space;\\mapsto\u0026space;(n\\\u0026space;mod\\\u0026space;p_{i},\\text{\u0026space;where\u0026space;}p_{i}\\text{\u0026space;is\u0026space;the\u0026space;}i^{th}\\text{\u0026space;prime\u0026space;number})\" /\u003e\n\nThe value of the identifier of the item (its number) is denoted \u003cimg src=\"https://latex.codecogs.com/gif.latex?n\" /\u003e and the modulating number \u003cimg src=\"https://latex.codecogs.com/gif.latex?M\" /\u003e. Modulos are calculated in \u003cimg src=\"https://latex.codecogs.com/gif.latex?O(1)\" /\u003e for fixed-sized integers. Since the multiplication is faster than the division (necessary for the calculation of the modulo), one can use multiplications by means of floating: \u003cimg src=\"https://latex.codecogs.com/gif.latex?M\u0026space;\\times\u0026space;\\left(n-\\text{\u0026space;int}\\left(n\\times\u0026space;1\u0026space;/\u0026space;M\\right)\\right)\" /\u003e.\n\nGiven the random nature of the numbers (or pseudo-random, since the identifiers of the items are generated by cryptographic hashing technologies), the tree is balanced. To unbalance the tree in a malicious way, it would be necessary to be able to generate hashes whose modulo follows a particular trajectory. However, the difficulty of such an operation increases exponentially (of the order of \u003cimg src=\"https://latex.codecogs.com/gif.latex?exp(plog(p))\" /\u003e where \u003cimg src=\"https://latex.codecogs.com/gif.latex?p\" /\u003e is the depth).\nAs a reminder, the product of the first 16 prime numbers equals \u003cimg src=\"https://latex.codecogs.com/gif.latex?32,589,158,477,190,044,730\u0026space;\\simeq\u0026space;3\u0026space;\\times\u0026space;10^{19}\" /\u003e.\nTherefore, as soon as the index contains a reasonably large amount of data, unbalancing the tree in a malicious way would become more and more impossible, if at all possible.\n\nA *Leaf* contains the following list of information about an item:\n * Identifier of the current item as a hash string;\n * Position: start address of the current item in the file;\n * Size: the size (in bytes) of the saved item in the file;\n * Origin: unique identifier of the item that is at the origin of the item's subchain;\n * Previous: unique identifier of the previous item chained to it;\n * Next: optionally, unique identifier of the next item chained.\n\n A *Leaf* whose next item field is empty is the last item in the subchain.\n \n A *Leaf* whose origin item field is equal to the identifier of the current item is necessarily the origin of the subchain. As such, it has a particular operating since, if there were to be one or more items thereafter, the last item of the subchain will be identified here as the previous item. The last three fields of the *Leaf* therefore correspond to a circular linked list.\n\n#### 2) Using the index\n\nTo add an element to the index:\n* The new *Leaf* is written in the index;\n* We update the 'Next' field of the *Leaf* that previously corresponded to the last item of the subchain;\n* We modify the 'Previous' field of the *Leaf* of the original item by writing the identifier of the current item.\n\nTo read/search an item in the index:\n* We find in the tree the *Leaf* corresponding to the identifier of the searched item:\n  * If the 'Next' field of the *Leaf* is empty, this is the last item of the subchain;\n  * Otherwise, we go to the next step;\n* We find the *Leaf* corresponding to the identifier of the field 'Origin';\n* We use the 'Previous' field of this *Leaf* to find the last item of the subchain.\n\nWhen using the index, we can seen that we would perform at most 3 reads or 3 writes and index runs of \u003cimg src=\"https://latex.codecogs.com/gif.latex?O(log(n))\" /\u003e order, where \u003cimg src=\"https://latex.codecogs.com/gif.latex?n\" /\u003e is the number of items in the index.\n\n\nFor more details, feel free to read the full [white paper](documentation/src/latex/treee_whitepaper.pdf).\n\n\n### Usage\n\n```console\n$ go get github.com/cyrildever/treee\n```\n\n```golang\nimport (\n  \"github.com/cyrildever/treee/core/index\"\n  \"github.com/cyrildever/treee/core/index/branch\"\n  \"github.com/cyrildever/treee/core/model\"\n)\n\n// Instantiate default index (could be any prime number up to the 1000th existing prime number,\n// but should be much lower to better leverage the solution, and if 0 will use the default INIT_PRIME value)\ntreee, err := index.New(index.INIT_PRIME)\nif err != nil {\n  // Handle error\n}\n\n// Add to index\nleaf := branch.Leaf{\n  ID: model.Hash(\"1234567890abcdef\"),\n  Position: 0,\n  Size: 100,\n}\nerr = treee.Add(leaf)\nif err != nil {\n  // Handle error\n}\n\n// Search\nif found, err := treee.Search(leaf.ID); err == nil {\n  // Do something with found Leaf\n}\n```\n\nFor better performance, you should put your search requests in different goroutines (see `GetLeaf()` implementation in [api/handlers/leaf.go](api/handlers/leaf.go) file for example).\n\nFor debugging or storage purposes, you might want to use the `PrintAll()` method on the `Treee` index to print all recorded leaves to a writer (passing it `true` as argument for beautifying the printed JSON, or `false` for the raw string).\n```golang\n// To print to Stdout\nfmt.Println(treee.PrintAll(true))\n```\n\nPersistence of the index is achieved through the use of the automatic save made upon insertion, and the use of the `Load()` function instead of Treee instantiation with `New()` at start-up.\n```golang\ntreee, err := index.Load(\"path/to/treee.json\") // If empty, will use \"./saved/treee.json\"\n```\nIt could be disabled using the corresponding environment variable or flag in the command line, or even programmatically:\n```golang\ntreee.UsePersistence(false) // If you're positive you don't want it\n```\n\n\n### Executable\n\nYou can simply build the executable and start an instance of the Treee\u0026trade; indexing engine.\n\n```console\n$ git clone https://github.com/cyrildever/treee.git \u0026\u0026 cd treee \u0026\u0026 go build\n$ ./treee -t.port 7001 -t.host localhost -t.init 101\n```\n\n```\nUsage of ./treee:\n  -t.file string\n        File path to an existing index\n  -t.host string\n        Host address (default \"0.0.0.0\")\n  -t.init string\n        Initial prime number to use for the index (default \"0\")\n  -t.persist\n        Activate persistence (default true)\n  -t.port string\n        HTTP port number (default \"7000\")\n```\n\n##### Environment variables\n\nIf set, the following environment variables will override any corresponding default configuration or flag passed with the command line:\n- `HOST`: the host address;\n- `HTTP_PORT`: the HTTP port number to use;\n- `INDEX_PATH`: the path to the index file in JSON format;\n- `INIT_PRIME`: the initial prime number (note that it won't have any effect if using a file because the latter will prevail);\n- `USE_PERSISTENCE`: set `false` to disable the use of saving the index into a file.\n\n##### API\n\nThe following endpoints are available under the `/api` group:\n\n* `DELETE /leaf`\n\nThis endpoint removes the passed items from the index.\n\nIt expects an array of IDs as `ids` query argument, eg.\n```http\nDELETE /api/leaf?ids=1235467890abcdef[...]\u0026ids=fedcba0987654321[...]\nUser-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)\nHost: treee.io\nAccept-Language: fr-FR\nAccept-Encoding: gzip, deflate\nAccept: application/json\n```\n\nIt returns a `204` status code if all passed items were removed, or a `200` status code along with the following json list of undeleted IDs if some weren't removed:\n```json\n{\n  \"ids\": [\"fedcba0987654321[...]\"]\n}\n```\n\n* `GET /leaf`\n\nThis endpoint searches items based on the passed IDs.\n\nIt expects an array of IDs as `ids` query argument and an optional `takeLast` boolean (default to `false`), eg. `http://localhost:7000/api/leaf?ids=1234567890abcdef[...]\u0026ids=fedcba0987654321[...]\u0026takeLast=true`\n\nIt returns a status code `200` along with a JSON object respecting the following format:\n```json\n[\n  {\n    \"id\": \"1234567890abcdef[...]\",\n    \"position\": 0,\n    \"size\": 100,\n    \"origin\": \"1234567890abcdef[...]\",\n    \"previous\": \"1234567890abcdef[...]\",\n    \"next\": \"\"\n  },\n  [...]\n]\n```\n\nIn case no item were found, it returns a `404` status code with an empty body.\n\n* `GET /line`\n\nThis endpoint returns all the IDs in the same subchain/line.\n\nIt expects any ID of a line as `id` query argument, eg. `http://localhost:7000/api/line?id=1234567890abcdef[...]`\n\nIt returns a status code `200` long with the following JSON sorted array of IDs (index `0` being the origin):\n```json\n[\n  \"1234567890abcdef[...]\",\n  \"fedcba0987654321[...]\"\n]\n```\n\nIn case no item were found, it returns a `404` status code with an empty body.\n\n* `POST /leaf`\n\nThis endpoint adds an item to the index.\n\nIt expects the following JSON object as body, the first three fields being mandatory:\n```json\n{\n  \"id\": \"\u003cThe item ID as a hash string representation\u003e\",\n  \"position\": 0,\n  \"size\": 100,\n  \"previous\": \"\u003cAn optional item ID of the previous item in the current subchain if any\u003e\"\n}\n```\n\nIt returns a status code and the following object as JSON:\n```json\n{\n  \"code\": 200 | 303 | 400 | 404 | 412 | 500,\n  \"result\": \"\u003cThe inserted item ID if success\u003e\",\n  \"error\": \"\u003cThe error message if failed\u003e\"\n}\n```\n\nThe list of status codes (and their meaning) is as follows:\n  - `200`: item inserted;\n  - `303`: item already exists (not updated as the file is supposed to be immutable);\n  - `400`: wrong parameter (missing item, missing mandatory field, etc.);\n  - `404`: passed previous item not found;\n  - `412`: something in the passed data caused the server to fail (incorrect JSON format, ...);\n  - `500`: an error occurred on the server.\n\n\n### Performances\n\nIn average, a basic machine should be able to ingest over 100 millions new records and handle about 5 billions search queries per hour.\n\nAs an example, on an Apple MacBook Pro 2.3 GHz Intel Core i9 with 16 Go DDR4 RAM clocked at 2400 MHz, I observed the following performances when using `101` as init prime:\n- insertion: 10,000 additions in ~300ms (120 millions per hour);\n- search: 1,000,000 requests in ~500ms, ie. approx. 2 MHz (over 7 billions per hour).\nAnd on an Apple iMac 3.1 Ghz Intel Core i5 with 16 Go DDR4 RAM clocked at 2667 MHz also using `101` as init prime:\n- insertion: 10,000 additions in ~240ms (150 millions per hour);\n- search: 1,000,000 requests in ~750ms, ie. approx. 1.3 MHz (over 4.8 billions per hour).\n\n\n### License\n\nThis module is distributed under a MIT license. \\\nSee the [LICENSE](LICENSE) file.\n\n\n\u003chr /\u003e\n\u0026copy; 2018-2025 Cyril Dever. All rights reserved.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyrildever%2Ftreee","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyrildever%2Ftreee","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyrildever%2Ftreee/lists"}