{"id":18584494,"url":"https://github.com/bertrand31/damysos","last_synced_at":"2025-08-17T16:17:28.700Z","repository":{"id":170071149,"uuid":"185511834","full_name":"Bertrand31/Damysos","owner":"Bertrand31","description":"🌍 An experimental data structure allowing lightning-fast, constant-time lookups of large datasets for neighboring multi-dimensional points","archived":false,"fork":false,"pushed_at":"2023-07-05T09:20:32.000Z","size":15844,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-16T05:37:51.635Z","etag":null,"topics":["coordinates","data-structures","experiment","functional-programming","gps","performance","scala","trie"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Bertrand31.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-05-08T02:20:03.000Z","updated_at":"2023-07-07T19:29:27.000Z","dependencies_parsed_at":"2023-10-20T18:23:00.135Z","dependency_job_id":null,"html_url":"https://github.com/Bertrand31/Damysos","commit_stats":null,"previous_names":["bertrand31/damysos"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/Bertrand31/Damysos","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bertrand31%2FDamysos","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bertrand31%2FDamysos/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bertrand31%2FDamysos/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bertrand31%2FDamysos/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Bertrand31","download_url":"https://codeload.github.com/Bertrand31/Damysos/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bertrand31%2FDamysos/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270871658,"owners_count":24660242,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-17T02:00:09.016Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coordinates","data-structures","experiment","functional-programming","gps","performance","scala","trie"],"created_at":"2024-11-07T00:27:41.789Z","updated_at":"2025-08-17T16:17:23.572Z","avatar_url":"https://github.com/Bertrand31.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Damysos\n\n![GitHub release](https://img.shields.io/github/release/Bertrand31/Damysos.svg)\n![GitHub Release Date](https://img.shields.io/github/release-date/Bertrand31/Damysos.svg)\n[![codecov](https://codecov.io/gh/Bertrand31/Damysos/branch/master/graph/badge.svg)](https://codecov.io/gh/Bertrand31/Damysos)\n[![TravisCI](https://api.travis-ci.com/Bertrand31/Damysos.svg?branch=master)](https://travis-ci.com/Bertrand31/Damysos)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/b19c781500ef4434af54a6699892efcf)](https://www.codacy.com/app/bertrandjun/Damysos)\n![GitHub issues](https://img.shields.io/github/issues/Bertrand31/Damysos.svg)\n![GitHub](https://img.shields.io/github/license/Bertrand31/Damysos.svg)\n\n- [Overview](#overview)\n- [Performance](#performance)\n- [Usage](#usage)\n- [Caveats](#caveats)\n\n## Overview\n\nDamysos is an experiment stemming from the idea that tries could be used to store points in a\nn-dimensional space in a way that would allow for fast querying of \"neighboring\" points.\n\nIn order to achieve this, we first have to turn each coordinate or every point into a base 4\nnumber such that the more spacial proximity between two points, the more characters their\ntransformed coordinates share, and this, going from left to right.\n\nFor example, if point A's encoded abscissa coordinate is \"111\", point B's is \"112\" and point C's is\n\"100\", we can tell that point A is closer to point B than it is of point C and point C is closer to\npoint A than it is of point B (along the aformentionned abscissa).\n\nThis way, in order to get the neighboring points of a coordinate, we only have to compute the\n\"trie path\" for those coordinates, and descend the trie at the desired depth (the level of\nprecision, or \"zoom\"). Then, we take all the leaves below that point.\n\nThis would work well if we were storing monodimensional points. But, in our case, we chose to store\nGPS coordinates, which are by nature bi-dimensional. To go from 1-dimensional coordinates to\nn-dimensional coordinates while still maintaining the same level of performance, I had to come up\nwith a n-dimensional trie. It is basically a trie that, in each node, holds a n-dimensional array\nrepresenting each possible n-dimensional value of a n-dimensional path.\n\nBecause this may seem very abstract, we simply need to compare it to a normal trie: in a normal trie,\na \"path\" would be a word. For the word \"foo\", the path would be `List(\"f\", \"o\", \"o\")`.\nNow in a 2-dimensional trie, a path would look something like this: `List((1, 6), (4, 2), ...)`.\nNotice we have tuples now, because each step of the path is 2-dimensional. I have replaced\ncharacters with numbers simply because at this point, we're already far enough from the usual,\nword-searching use-case of the usual Trie that we can stop pretending :)\n\nThe implementation of this n-dimensional trie is found in\n[GeoTrie.scala](src/main/scala/damysos/GeoTrie.scala). However for the sake of simplicity and\nbecause of the specificity of our use-case (GPS coordinates), GeoTrie is actually\na 2-dimensional trie. In the future, I might extract it into a separate project and really make it\nn-dimensional (taking n as a constructor parameter) but as far as Damysos is concerned, there's no\npoint in doing that.\n\nWhat's interesting in this approach, in my opinion, resides in the fact that nowhere in the code we\nare actually commparing GPS coordinates, calculating distances etc. The data structure itself, in\nthis case a Trie, _is_ the logic.\n\n## Performance\n\nHere are the results of running the `PerfSpec` class on a laptop with an\n_Intel Core i7-1065G7 CPU @ 1.30GHz_ CPU on a dataset of **23 435 958** points:\n```text\n============================\nProfiling Damysos search:\nCold run        24 142 ns\nMax hot         34 109 ns\nMin hot         363 ns\nMed hot         392 ns\n\n============================\nProfiling Linear search:\nCold run        189 494 799 ns\nMax hot         177 992 346 ns\nMin hot         149 712 474 ns\nMed hot         150 693 588 ns\n\n384422 times faster\n```\nAs you can see, it is orders of magnitude faster than a linear search. This is because the Damysos\ntrie has a fixed height ; the amount of data we add to it has no influence over its depth or\nstructure. Retrieving neighbouring points takes constant time (`Θ(1)`), thus the more data we have,\nthe bigger the gap with the naïve, linear search.\n\nThe speed of that search, however, depends on the level of precision (or \"zoom\") you want to\nachieve.  Although it may appear counter intuitive, a lower precision actually means a longer query\ntime. This is because, if we are using tries 10 levels deep and we ask for a precision of 5, then\nwe'll descend 5 levels of the trie (very fast, and tail-recursive) and then explore all the branches\nbelow that point to get all the points underneath it (that's the slower part).\nHence, the lower the precision, the less we descend the trie before we start exploring all of its\nsub-tries, so the more branches we'll have to explore from that point.\n\n## Usage\n\nFirst, create a Damysos instance. Then, feed it multiple `PointOfInterest` to add data to it:\n```scala\nimport damysos.Damysos\n\nval damysos = Damysos()\nval paris = PointOfInterest(\"Paris\" Coordinates(43.2D, -80.38333D))\nval toulouse = PointOfInterest(\"Toulouse\" Coordinates(43.60426D, 1.44367D))\nval pointsOfInterest = Seq(paris, toulouse)\nval bayonne = PointOfInterest(\"Bayonne\", Coordinates(43.48333D, -1.48333D))\ndamysos ++ pointsOfInterest + bayonne\n```\nThe `++` method accepts a `TraversableOnce` argument, it means you can feed it either a normal\n`Collection` (like the `Seq` above) or a lazy `Iterator`:\n```scala\nimport damysos.PointOfInterest\n\nval data: Iterator[PointOfInterest] = PointOfInterest.loadFromCSV(\"cities_world.csv\")\ndamysos ++ data // Lines will be pulled one by one from the CSV file to be added to the Damysos\n```\nFrom there, we can start querying our data structure:\n```scala\ndamysos.contains(bayonne)\n\ndamysos.findSurrounding(paris)\n```\nNote that `findSurrounding` also takes an optional `precision` parameter to adjust the \"zoom\" level:\n```scala\ndamysos.findSurrounding(paris, precision=4)\n```\nIt also supports removing single element or a `TraversableOnce` of elements:\n```scala\ndamysos - paris\ndamysos -- data\n```\nIt also supports returning all of its contents as an `Array` and lastly, counting the number of\nelements it contains:\n```scala\ndamysos.toArray\ndamysos.size\n```\n\n## Caveats\n\nBecause of the way tries work and the encoding of coordinates, when we're nearing a \"breakoff point\"\nof the base we have chosen, the trie won't \"see\" anything that is geographically close, but\nwhich key is right after this breakoff point.\n\nFor example, the paths \"233\" and \"300\" have nothing in common as far as a trie is concerned and yet,\nas base-4 numbers, they are numerically contiguous. Which means the points they represent are very\nclose.\n\nFor this reason, Damysos will sometimes give incomplete results, and will be \"blind\" to everything\nthat is after of before the aforementionned \"breakup points\".\n\n**This is why Damysos' goal is not to reliably provide exhaustive results, but rather return _some_\nneighboring points, as quickly as possible.**\n\nFor this reason, it should be considered a probabilistic data structure: the surrounding points that\nget returned are definitely neighbours of the coordinates you entered (no false positives), however,\n those results may very well be incomplete (false negatives).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertrand31%2Fdamysos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbertrand31%2Fdamysos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertrand31%2Fdamysos/lists"}