{"id":21039013,"url":"https://github.com/VectorElectron/structron","last_synced_at":"2025-10-01T04:30:42.928Z","repository":{"id":236431908,"uuid":"792601226","full_name":"Image-Py/nbstl","owner":"Image-Py","description":"stl container for numba","archived":false,"fork":false,"pushed_at":"2024-04-30T08:49:51.000Z","size":18,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-19T13:46:28.971Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Image-Py.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-27T03:19:09.000Z","updated_at":"2024-04-30T08:49:54.000Z","dependencies_parsed_at":"2024-11-19T13:53:51.360Z","dependency_job_id":null,"html_url":"https://github.com/Image-Py/nbstl","commit_stats":null,"previous_names":["image-py/nbstl"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Image-Py%2Fnbstl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Image-Py%2Fnbstl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Image-Py%2Fnbstl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Image-Py%2Fnbstl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Image-Py","download_url":"https://codeload.github.com/Image-Py/nbstl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234821236,"owners_count":18892008,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T13:37:14.716Z","updated_at":"2025-10-01T04:30:42.919Z","avatar_url":"https://github.com/Image-Py.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# structron \n\nRecently, I am trying to write some computational geometry-related functionalities. Since data structures like heaps, stacks, queues, trees, etc., are frequently used in computational geometry, these structures cannot be directly expressed as vector operations in NumPy. Also, Numba currently does not provide many advanced data structures. Therefore, I would like to develop a set of containers based on Numba to support computational geometry algorithms. structron comes from \"Numba's STL container.\"\n\n## Container Objects\nRegarding containers:\nNumba provides TypedList, which is primarily designed for general-purpose workflows and offers methods such as insert, pop, append, and more. However, when it comes to solving specific problems, there may not be efficient specialized data structures available (this is a conjecture and has not been tested). Therefore, it is suggested to implement various specialized containers for addressing specific problem domains.\n\n## Features\n\n1. **Seamless Integration with NumPy**:  \n   All containers are implemented based on `ndarray`, ensuring seamless integration with various data processing scenarios. This makes them ideal for numerical computing and data analysis tasks.\n\n2. **Efficient Linear Table Implementation**:  \n   Linear tables (e.g., `TypedMemory`) use an array-based `next` column to replace traditional pointers. This ensures data is always stored in order, and released slots are efficiently reused in a circular manner.\n\n3. **Pure Array-Based Red-Black Tree**:  \n   The Red-Black Tree is implemented purely using arrays, with `left` and `right` columns recording the indices of corresponding nodes in the array. This design eliminates the need for dynamic memory allocation and pointer management, improving performance and memory efficiency.\n\n4. **Outstanding Performance**:  \n   Extensive testing shows that the performance of `Hash` and `RedBlackTree` is comparable to C++'s STL. This makes the containers suitable for high-performance computing and real-time applications.\n\n### Why Choose These Containers?\n- **Memory Efficiency**: Leveraging `ndarray` and array-based designs minimizes memory overhead.\n- **High Performance**: Optimized implementations ensure competitive performance with low-level languages.\n- **Flexibility**: Supports both standalone and shared memory modes, catering to diverse use cases.\n- **Ease of Use**: Designed to work seamlessly with Python's scientific computing ecosystem.\n\n## `TypedMemory` Container\n\n`TypedMemory` is a high-performance memory management container for custom data types, built with `numba`. It dynamically manages memory allocation and deallocation, maintaining a free list for efficient reuse of released slots and automatically expanding capacity when necessary.\n\n### Example Code\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Custom point structure\nt_point = np.dtype([('x', np.float32), ('y', np.float32)])\n\n# Build two points\np1, p2 = np.array([(1,1), (2,2)], t_point)\n\n# Define a TypedMemory class for the custom dtype\nPointMemory = structron.TypedMemory(t_point)\n\n# Instantiate a memory pool with a capacity of 10\npoints = PointMemory(10)\n\n# Allocate memory and store a value\ni = points.push(p1)  # Returns the index of the allocated slot\n\n# Access the value at index `i`\npoints[i]  # Returns (x=1.0, y=1.0)\n\n# Deallocate the memory at index `i`\npoints.pop(i)  # Frees the slot for reuse\n\n# Get container size\npoints.size  # Current number of allocated elements\nlen(points)  # Total capacity of the memory pool\n```\n\n### Methods\n\n- `TypedMemory(dtype)`: Defines a memory pool class for the specified `dtype`.\n- `push(value)`: Allocates a slot and stores `value`. Returns the index.\n- `pop(index)`: Deallocates the slot at `index` and adds it to the free list.\n- `__getitem__(index)`: Retrieves the value at `index`.\n- `size`: Returns the current number of allocated slots.\n\n\n## `TypedDeque` Container\n\n`TypedDeque` is a double-ended queue (deque) implementation for custom data types, optimized with `numba`. It allows efficient insertion and removal of elements at both ends.\n\n### Example Code\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Custom point structure\nt_point = np.dtype([('x', np.float32), ('y', np.float32)])\n\n# Build two points\np1, p2 = np.array([(1,1), (2,2)], t_point)\n\n# Define a TypedDeque class for the custom dtype\nPointDeque = structron.TypedDeque(t_point)\n\n# Instantiate a deque with a capacity of 10\npoints = PointDeque(10)\n\n# Push elements\npoints.push_front(p1)  # Push to the front\npoints.push_back(p2)   # Push to the back\n\n# Access elements\npoints.first()  # Get the first element\npoints.last()   # Get the last element\n\n# Pop elements\npoints.pop_front()  # Remove the first element\npoints.pop_back()   # Remove the last element\n\n# Get container size\npoints.size  # Current number of elements\nlen(points)  # Total capacity of the deque\n```\n\n### Methods\n\n- `TypedDeque(dtype)`: Defines a deque class for the specified `dtype`.\n- `push_front(value)`: Adds `value` to the front of the deque.\n- `push_back(value)`: Adds `value` to the back of the deque.\n- `first()`: Returns the first element in the deque.\n- `last()`: Returns the last element in the deque.\n- `pop_front()`: Removes and returns the first element.\n- `pop_back()`: Removes and returns the last element.\n- `size`: Returns the current number of elements in the deque.\n\n\n## `TypedStack` Container\n\n`TypedStack` is a stack implementation for custom data types, optimized with `numba`. It follows the Last-In-First-Out (LIFO) principle.\n\n### Example Code\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Custom point structure\nt_point = np.dtype([('x', np.float32), ('y', np.float32)])\n\n# Build two points\np1, p2 = np.array([(1,1), (2,2)], t_point)\n\n# Define a TypedStack class for the custom dtype\nPointStack = structron.TypedStack(t_point)\n\n# Instantiate a stack with a capacity of 10\npoints = PointStack(10)\n\n# Push elements\npoints.push(p1)  # Push to the stack\npoints.push(p2)  # Push another element\n\n# Access the top element\npoints.top()  # Returns (x=2.0, y=2.0)\n\n# Pop the top element\npoints.pop()  # Removes and returns (x=2.0, y=2.0)\n\n# Get container size\npoints.size  # Current number of elements in the stack\nlen(points)  # Same as `size`, not the total capacity\n```\n\n### Methods\n\n- `TypedStack(dtype)`: Defines a stack class for the specified `dtype`.\n- `push(value)`: Adds `value` to the top of the stack.\n- `top()`: Returns the top element without removing it.\n- `pop()`: Removes and returns the top element.\n- `size`: Returns the current number of elements in the stack.\n\n\n## `TypedQueue` Container\n\n`TypedQueue` is a queue implementation for custom data types, optimized with `numba`. It follows the First-In-First-Out (FIFO) principle.\n\n### Example Code\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Custom point structure\nt_point = np.dtype([('x', np.float32), ('y', np.float32)])\n\n# Build two points\np1, p2 = np.array([(1,1), (2,2)], t_point)\n\n# Define a TypedQueue class for the custom dtype\nPointQueue = structron.TypedQueue(t_point)\n\n# Instantiate a queue with a capacity of 10\npoints = PointQueue(10)\n\n# Push elements\npoints.push(p1)  # Add to the queue\npoints.push(p2)  # Add another element\n\n# Access the front element\npoints.top()  # Returns (x=1.0, y=1.0)\n\n# Pop the front element\npoints.pop()  # Removes and returns (x=1.0, y=1.0)\n\n# Get container size\npoints.size  # Current number of elements in the queue\n```\n\n### Methods\n\n- `TypedQueue(dtype)`: Defines a queue class for the specified `dtype`.\n- `push(value)`: Adds `value` to the back of the queue.\n- `top()`: Returns the front element without removing it.\n- `pop()`: Removes and returns the front element.\n- `size`: Returns the current number of elements in the queue.\n\n\n## `TypedHeap` Container\n\n`TypedHeap` is a heap (priority queue) implementation for custom data types, optimized with `numba`. It maintains elements in a way that allows efficient access to the smallest (or largest) element.\n\n### Example Code\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Define a TypedHeap class for int32\nIntHeap = structron.TypedHeap(np.int32)\n\n# Instantiate a heap with a capacity of 10\nheap = IntHeap(10)\n\n# Push elements\nheap.push(1)  # Add to the heap\nheap.push(2)  # Add another element\n\n# Access the top element\nheap.top()  # Returns the smallest element (1)\n\n# Pop the top element\nheap.pop()  # Removes and returns the smallest element (1)\n\n# Get container size\nheap.size  # Current number of elements in the heap\nlen(heap)  # Same as `size`\n```\n\n### Methods\n\n- `TypedHeap(dtype)`: Defines a heap class for the specified `dtype`.\n- `push(value)`: Adds `value` to the heap.\n- `top()`: Returns the smallest element without removing it.\n- `pop()`: Removes and returns the smallest element.\n- `size`: Returns the current number of elements in the heap.\n\n## `TypedHash` Container\n\n`TypedHash` is a hash-based container for custom data types, optimized with `numba`. It provides efficient insertion, deletion, and lookup operations.\n\n### Example Code\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Define a TypedHash class for int32\nIntHash = structron.TypedHash(np.int32)\n\n# Instantiate a hash container with a capacity of 10\nhashset = IntHash(10)\n\n# Insert an element\nhashset.push(4)  # Add to the hash container\n\n# Check if an element exists\nhashset.has(4)  # Returns True if the element is present\n\n# Remove an element\nhashset.pop(4)  # Removes the element from the container\n\n# Get container size\nhashset.size  # Current number of elements in the container\nlen(hashset)  # Same as `size`\n```\n\n### Methods\n\n- `TypedHash(dtype)`: Defines a hash container class for the specified `dtype`.\n- `push(value)`: Adds `value` to the hash container.\n- `has(value)`: Checks if `value` exists in the container.\n- `pop(value)`: Removes `value` from the container.\n- `size`: Returns the current number of elements in the container.\n\n## `TypedRedBlackTree` Container\n\n`TypedRedBlackTree` is a Red-Black Tree implementation for custom data types, optimized with `numba`. It provides efficient insertion, deletion, lookup, and neighbor access operations.\n\n### Example Code\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Define a TypedRedBlackTree class for int32\nIntTree = structron.TypedRBTree(np.int32)\n\n# Instantiate a tree with a capacity of 10\ntreeset = IntTree(10)\n\n# Insert an element\ntreeset.push(1)  # Add to the tree\n\n# Check if an element exists\ntreeset.has(1)  # Returns True if the element is present\n\n# Remove an element\ntreeset.pop(1)  # Removes the element from the tree\n\n# Access neighbors\ntreeset.left(1)  # Returns the left neighbor of the key\ntreeset.right(1) # Returns the right neighbor of the key\n\n# Get container size\ntreeset.size  # Current number of elements in the tree\nlen(treeset)  # Same as `size`\n```\n\n### Methods\n\n- `TypedRBTree(dtype)`: Defines a Red-Black Tree class for the specified `dtype`.\n- `push(value)`: Adds `value` to the tree.\n- `has(value)`: Checks if `value` exists in the tree.\n- `pop(value)`: Removes `value` from the tree.\n- `left(value)`: Returns the left neighbor of `value` (smaller than `value`).\n- `right(value)`: Returns the right neighbor of `value` (larger than `value`).\n- `size`: Returns the current number of elements in the tree.\n\n\n## `TypedAVLTree` Container\n\n`TypedAVLTree` is an AVL Tree implementation for custom data types, optimized with `numba`. Its interface is identical to `TypedRedBlackTree`.\n\n### Example Code\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Define a TypedAVLTree class for int32\nIntTree = structron.TypedAVLTree(np.int32)\n\n# Same as TypedRBTree\n```\n\n### Methods\n\nThe interface is identical to `TypedRedBlackTree`. Refer to the `TypedRedBlackTree` documentation for details.\n\n\n## Map Mode: `\u003cHeap, Hash, RedBlackTree, AVLTree\u003e`\n\nThe map mode allows using `Heap`, `Hash`, `RedBlackTree`, and `AVLTree` as key-value containers. The `TypedXXX` function accepts two parameters: the key type and the value type. The interface extends the non-map mode by requiring a key and value for `push`, and supports accessing values using the key as a subscript.\n\n### Example Code (Using `RedBlackTree`)\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Custom point structure\nt_point = np.dtype([('x', np.float32), ('y', np.float32)])\n\n# Build two points\np1, p2 = np.array([(1,1), (2,2)], t_point)\n\n# Define a TypedRBTree class for int32 keys and point values\nIntPointTree = structron.TypedRBTree(np.int32, t_point)\n\n# Instantiate a tree with a capacity of 10\ntreemap = IntPointTree(10)\n\n# Insert a key-value pair\ntreemap.push(1, p1)  # Push key=1, value=(1, 1)\n\n# Access the value by key\ntreemap[1]  # Returns the value (x=1.0, y=1.0)\n\n# Other methods are the same as the non-map mode\n# e.g., treemap.has(1), treemap.pop(1), treemap.left(1), treemap.right(1)\n```\n\n### Supported Containers\n\nThe map mode is supported for the following containers:\n- **Heap**: `TypedHeap(key_dtype, value_dtype)`\n- **Hash**: `TypedHash(key_dtype, value_dtype)`\n- **RedBlackTree**: `TypedRBTree(key_dtype, value_dtype)`\n- **AVLTree**: `TypedAVLTree(key_dtype, value_dtype)`\n\n### Key Differences from Non-Map Mode\n1. The `TypedXXX` function accepts two parameters: `key_dtype` and `value_dtype`.\n2. The `push` method requires both a key and a value: `push(key, value)`.\n3. Values can be accessed using the key as a subscript: `container[key]`.\n4. Other methods (e.g., `has`, `pop`, `left`, `right`) remain the same as in the non-map mode.\n\n## Eval Mode: `\u003cHeap, RedBlackTree, AVLTree\u003e`\n\nThe eval mode allows using `Heap`, `RedBlackTree`, and `AVLTree` as None-value containers. The `TypedXXX` function accepts two parameters: the evalue function and the value type. The interface extends the key-map mode by passing None as key, and the eval function would evalue it as key.\n\n### Example Code (Using `Heap`)\n\n```python\nimport numpy as np\nimport numba as nb\nimport structron\n\n# Custom point structure\nt_point = np.dtype([('x', np.float32), ('y', np.float32)])\n\n# Build two points\np1, p2 = np.array([(1,1), (2,2)], t_point)\n\n# a simple eval function, self means the heap object\n# we can subclass the TypedHeap, and add some properties, then use them in eval function\neval = lambda self, p: p.x + p.y\n\n# Define a TypedHeap class for point values with eval function\nPointHeap = structron.TypedHeap(eval, t_point)\n\n# Instantiate a heap with capacity of 10\nheap = PointHeap(10)\n\n# Insert 2 points\nheap.push(None, p1)\nheap.push(None, p2)\n\n# Get the top element\nheap.top()\n\n# Other methods are the same as the map mode\n```\n\n### Supported Containers\n\nThe map mode is supported for the following containers:\n- **Heap**: `TypedHeap(key_dtype, value_dtype)`\n- **RedBlackTree**: `TypedRBTree(key_dtype, value_dtype)`\n- **AVLTree**: `TypedAVLTree(key_dtype, value_dtype)`\n\n### Differences from Map Mode\nIf the `eval` function is static (i.e., the evaluation result does not change over time), it is recommended to precompute the keys and use **Map Mode** instead. This approach is more efficient because the keys are calculated once and reused, avoiding repeated evaluations during container operations.\n\nIn contrast, **Eval Mode** dynamically evaluates values during container adjustments. For example, in a Red-Black Tree, the tree does not reorder all values when a new value is pushed. Instead, the new value is dynamically compared with relevant elements as needed. This is particularly useful in scenarios like the event queue in computational geometry (e.g., sweep line algorithms), where values need to be dynamically evaluated and ordered based on changing conditions.\n\n### More Container\nMore containers will be implemented successively. Welcome to provide suggestions or participate in project development.\n\n## Demo\nThe original intention of developing structron was to create a framework that facilitates the implementation of computational geometry algorithms. Here, we will demonstrate the usage of structron using a simple yet classic example, and perform a performance test.\n\n### Convex Hull\nAlgorithm Implementation:\n1. Sort the points by their X-coordinate.\n2. Build the upper half: Start from the leftmost point and push it onto the stack. For each new point, check the last two points on the stack. If they form a right turn with the new point, push the new point onto the stack. Otherwise, pop the top element from the stack until a right turn is obtained.\n3. Build the lower half: Start from the rightmost point and push it onto the stack. For each new point, check the last two points on the stack. If they form a right turn with the new point, push the new point onto the stack. Otherwise, pop the top element from the stack until a right turn is obtained.\n4. Combine the upper and lower halves to obtain the convex hull.\n\n```python\nimport matplotlib.pyplot as plt\n\nimport numpy as np\nimport numba as nb\nimport structron\n\n# build Point dtype and PointStack\nt_point = np.dtype([('x', np.float32), ('y', np.float32)])\nPointStack = structron.TypedStack(t_point)\n\n# push to stack one by one, if not turn right, pop\n@nb.njit\ndef convex_line(pts, idx):\n    hull = PointStack(128)\n    for i in idx:\n        p2 = pts[i]\n        while hull.size\u003e1:\n            p1 = hull.top(0)\n            p0 = hull.top(1)\n            s = p0.x*p1.y - p0.y*p1.x\n            s += p1.x*p2.y - p1.y*p2.x\n            s += p2.x*p0.y - p2.y*p0.x\n            if s\u003c-1e-6: break\n            hull.pop()\n        hull.push(p2)\n    return hull.body[:hull.size]\n\n# get up line and down line, then concat the hull\n@nb.njit\ndef convexhull(pts):\n    idx = np.argsort(pts['x'])\n    up = convex_line(pts, idx)\n    down = convex_line(pts, idx[::-1])\n    return np.concatenate((up, down[1:]))\n\nif __name__ == '__main__':\n    from time import time\n    # use randn to make 102400 random points\n    pts = np.random.randn(102400, 2).astype(np.float32)\n    pts = pts.ravel().view(t_point)\n\n    hull = convexhull(pts)\n    start = time()\n    hull = convexhull(pts)\n    print('convex hull of 102400 point cost:', time()-start)\n```\n![1714464150709](https://github.com/Image-Py/structron/assets/24822467/576eec48-5d0d-4d17-a84d-58ca70279845)\n\nThen, we perform performance comparison using Shapely for convex hull computation on datasets of the same.\n```python\nfrom shapely import geometry as geom\n\npts = np.random.randn(102400, 2).astype(np.float32)\nmpoints = geom.MultiPoint(pts)\nstart = time()\ncvxh = mpoints.convex_hull\nprint('the same points by shapely cost:', time()-start)\n```\nwe got the result below:\n```\nconvex hull of 102400 point cost: 0.01891\nthe same points by shapely cost: 0.04986\n\nconvex hull of 1024000 point cost: 0.23035\nthe same points by shapely cost: 1.08539\n```\nWith the increase in data volume, it has become evident that the performance of shapely is slower. Please note that this comparison is solely focused on the generation of convex hulls, excluding the time spent on data construction. In fact, the I/O overhead from numpy to shapely for large data sets is also significant. Therefore, I believe it is meaningful to build efficient 2D and 3D geometry algorithms using Python and leveraging numba.\n\n## Recruitment\nThis project involves extensive and in-depth knowledge of numba and requires expertise in computational geometry. We welcome interested individuals to join us in the development process.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FVectorElectron%2Fstructron","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FVectorElectron%2Fstructron","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FVectorElectron%2Fstructron/lists"}