https://github.com/StefanSalewski/RTree

Generic RTree for Nim
https://github.com/StefanSalewski/RTree

bulk-load dynamic generic k-nearest-neighbor nim nim-lang nim-language r-tree spatial-index

Last synced: 15 days ago
JSON representation

Generic RTree for Nim

Host: GitHub
URL: https://github.com/StefanSalewski/RTree
Owner: StefanSalewski
License: mit
Created: 2017-12-08T20:40:00.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2021-04-03T10:26:25.000Z (about 4 years ago)
Last Synced: 2025-05-05T20:25:32.061Z (17 days ago)
Topics: bulk-load, dynamic, generic, k-nearest-neighbor, nim, nim-lang, nim-language, r-tree, spatial-index
Language: Nim
Homepage: http://ssalewski.de/tmp/rtree.html
Size: 29.3 KB
Stars: 26
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.adoc
- License: LICENSE

Awesome Lists containing this project

awesome-nim - RTree - Generic R-tree implementation for Nim. (Data / Data Structures)

README

        = R-Trees

(c) Stefan Salewski                                     

:experimental:

:imagesdir: http://ssalewski.de/tmp

:source-highlighter: pygments

:pygments-style: monokai

:icons: font

:toc: left

== Pure Nim generic R-Tree and R*-Tree

R-Trees are dynamic tree data structures for spatial indexing/searching.

image::NimRTree.png[]

Typical objects to locate are rectangles, polygons, circles or other arbitrary shapes.

While used mostly to locate objects in two-dimensional space, this module supports

arbitrary dimensions with coordinates types including int and float.

Incremental insertion and deletion of objects is supported, as well as range search

(objects overlapping with a query box) and incremental nearest neighbor search. Additional

bulk loading of an initial set of objects is supported, giving shorter tree construction times,

lower memory consumption and faster access times. Bulk loaded trees remain fully dynamic. The

basic R-Tree as well as the improved R*-Tree is supported.

Locating objects works with bounding boxes, rectangles in 2D. Each element has a bounding box,

and for searching we specify a query box. The boxes are an array of `Ext[RT] = tuple[a, b: RT]`,

with an array element for each dimension, and `Ext` is a tuple of a range type RT like int or float.

For the range itself it is important that `a <= b`.

The search() proc returns all objects overlapping (partly) with a query box. Maybe later we can add

one more proc which returns only objects which are fully contained in the query box. (That is very easy --

the difficult part is finding good names for additional procs.)

The nearest() procs and iterators gets not a box, but a point called BoxCenter as parameter

and return elements nearest to the query point.

We can use a plain R-Tree or the advanced R*-Tree to store our objects. Most of the time

we will use the R*-Tree, because it's optimized splitting algorithm should give

better performance. First generic parameter of new() proc is the maximal number

of childs per node. 8 is a common value, 2 is min for plain R-Tree, 4 is min for

R*-Tree. Maximal number of children is 100, but it is not really limited. Next

three generic parameters are dimension >= 1, coordinates type and object type.

Additional you can pass the minimal fill factor, which is a percent value between

30 and 50. All these parameters can have an effect on performance and memory consumption,

but general the effects are small. The bulk loading procs gets a seq containing the

data objects as a parameter. Generally bulk loading is faster than single inserts and

should give an optimized tree -- optimized for performance and memory consumption.

Bulk loaded trees are also fully dynamic, you can later insert() and delete() elements.

Bulk loading should have advantages if elements are available early, and when the

tree is mostly static. Inserting new elements into a bulk loaded tree may be not

extremely fast, because nodes of bulk loaded trees are totally filled, so later inserts()

leads to splits, which takes some time. So for very dynamic data bulk loading may not

be really the best solution.

For nearest neighbor search we have a plain proc which returns a single element nearest

to a query point, and an iterator which can return multiple elements. Both proc and

iterator are available in a simple version which takes into account only the bounding

boxes of the objects, and a version which takes a dist() proc as additional parameter.

The dist() proc can estimate the distance more accurate, but may be slower.

NOTE: The dist() proc has to return the SQUARED distance from object to query point,

and this distance value has always to be not negative and not greater than the squared distance from

the query point to the bounding box of the object. The later condition is true for

each meaningful dist() function, the relation allows pruning of objects first by

plain bounding box checks, resulting in improved performance. The squared distance is

used because this can avoid sqrt() calculations in many cases -- for the circle dist()

function this is not true.

Most procs for insertion, deletion and location of objects accepts a tuple consisting

of the bounding box and the object itself as parameter or return this tuple type.

This type `L[D; RT; LT] = tuple[b: Box[D, RT], l: LT]` is sometimes called record.

And exception is the proc searchObj() which returns only the object itself, not the box.

This is to save costly seq operations when only the objects itself are requested.

=== API

http://ssalewski.de/tmp/rtree.html

The API may change in later versions of this software!

=== Install

----

nimble install rtree

----

=== Examples

To give an not too trivial example, we will use some circles in 2D with float coordinates.

We will define a proc box() for our circles, which gives us the bounding

rectangle, and a proc dist() which gives the distance of a circle from a query point.

[[test.nim]]

[source,nim]

.test.nim

----

# nim c test.nim

import rtree

from math import sqrt, `^`

type

  Circ = object

    x, y, r: float

# squared distance from query point to circle

proc dist(qo: BoxCenter[2, float]; l: L[2, float, Circ]): float =

  let c = l.l

  max(0, math.sqrt((qo[0] - c.x) ^ 2 + (qo[1] - c.y) ^ 2) - c.r) ^ 2

# bounding box of circle

proc box(c: Circ): Box[2, float] =

  result[0].a = c.x - c.r

  result[0].b = c.x + c.r

  result[1].a = c.y - c.r

  result[1].b = c.y + c.r

proc test =

  const P = [(9, 3, 2), (-2, 2, 1), (-5, -3, 2), (4, -2, 1)]

  var s: seq[L[2, float, Circ]] # buffer for bulk load

  var t = newRStarTree[8, 2, float, Circ]()

  var el: L[2, float, Circ]

  for p in P:

    let c = Circ(x: p[0].float, y: p[1].float, r: p[2].float)

    el = (box(c), c)

    s.add(el)

    t.insert(el)

  echo "circ in first quadrant:"

  echo t.search([(0.0, 12.0), (0.0, 12.0)])

  echo "circs below x axis"

  echo t.search([(-20.0, 20.0), (-12.0, 0.0)])

  echo "empty area"

  echo t.search([(0.0, 5.0), (0.0, 5.0)])

  echo "search a single circle located closest to origin using only bounding boxes"

  echo t.findNearestBox(BoxCenter[2, float]([0.0, 0.0]))

  echo "query circles sorted along x-axis, using only bounding boxes"

  for el in t.findNearestBox(BoxCenter[2, float]([-100.0, 0.0])): # sort along x axis

    echo el.l

  echo "search a single circle located closest to origin using exact dist() function"

  echo t.findNearest(BoxCenter[2, float]([0.0, 0.0]), dist)

  echo "query circles sorted along x-axis, using dist() function"

  for el in t.findNearest(BoxCenter[2, float]([-100.0, 0.0]), dist):

    echo el.l

  # and we can delete objects:

  assert(t.delete(s[0]))

  assert(not t.delete(s[0])) # that object was already deleted

  var bt = newRStarTree[8, 2, float, Circ](s) # a bulk loaded R-Tree, should work like t

  echo "done!"

test()

----

=== Testing

The rtee.nim module contains testing code, which is executed when you compile and run

it directly as main module. When you compile with option `-d:cairoGTK` a GUI window

is opened showing the tree of rectangles. For the later you have to install gintro

to enable cairo and GTK support. All testing has been done only for 2D yet -- bugs

for other dimensions should be easy to fix when they exist.

=== Performance

All the used algorithm where described in detail in the papers listed below and implemented in

Nim language, so performance is expected to be good and state of the art. The module is currently used

most of the time only with a few thousand objects, for example to locate objects on a 2D canvas.

For this usecase performance is more than sufficient. When you intent to use the module for

very large data sets, then we recommend that you do some benchmarking and profiling yourself. You may compare

plain R-Tree to R*-Tree, test impact of bulk loading, and test various number of child nodes and

fill factor. A simple way to improve performance may be to mark some of the smaller procs with

{.inline.} pragma -- this should be not necessary when you compile your app with -flto option to

enable link-time-optimization.

=== References

* https://en.wikipedia.org/wiki/R-tree[Wikipedia R-Tree]

* http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf[original R-Tree by Guttman]

* http://dbs.mathematik.uni-marburg.de/publications/myPapers/1990/BKSS90.pdf[improved R*-Tree]

* http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.386.8193&rep=rep1&type=pdf[incremental k nearest neighbor search]

* http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.9245&rep=rep1&type=pdf[sketch of TGS bulk loading] 

* http://www.cs.umd.edu/~hjs/pubs/AlborziIPL07.pdf[detailed algorithm of TGS bulk loading]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/StefanSalewski/RTree

Awesome Lists containing this project

README