{"id":16402762,"url":"https://github.com/demoriarty/doksparse","last_synced_at":"2025-02-23T16:13:00.082Z","repository":{"id":179576982,"uuid":"616972569","full_name":"DeMoriarty/DOKSparse","owner":"DeMoriarty","description":"sparse DOK tensors on GPU, pytorch","archived":false,"fork":false,"pushed_at":"2023-07-09T03:15:24.000Z","size":339,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-05T05:23:56.780Z","etag":null,"topics":["cuda","pytorch","sparse"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DeMoriarty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-21T13:01:50.000Z","updated_at":"2024-04-17T16:35:28.000Z","dependencies_parsed_at":"2023-07-22T15:31:41.647Z","dependency_job_id":null,"html_url":"https://github.com/DeMoriarty/DOKSparse","commit_stats":null,"previous_names":["demoriarty/doksparse"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeMoriarty%2FDOKSparse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeMoriarty%2FDOKSparse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeMoriarty%2FDOKSparse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeMoriarty%2FDOKSparse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DeMoriarty","download_url":"https://codeload.github.com/DeMoriarty/DOKSparse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240339584,"owners_count":19785957,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","pytorch","sparse"],"created_at":"2024-10-11T05:47:10.226Z","updated_at":"2025-02-23T16:13:00.067Z","avatar_url":"https://github.com/DeMoriarty.png","language":"Cuda","readme":"# Sparse DOK (Dictionary of Keys) Tensors on GPU\nCommon sparse matrix/tensor formats such as COO, CSR and LIL do not support constant time access/asignment of each individual tensor element. Because of that, PyTorch supports very limited indexing operations for its sparse tensor formats, and numpy-like advanced indexing is not supportd for the most part. \n\n**DOK (Dictionary of Keys)** is a sparse tensor format that uses a hashmap to store index-value pairs. Accessing any individual element, including elements that are zero, is theoretically constant time. DOK format can also be converted to uncoalesced COO format with minimal cost.\n\nThis repository contains an implementation of sparse DOK tensor format in CUDA and pytorch, as well as a hashmap as its backbone. The main goal of this project is to make sparse tensors behave as closely to dense tensors as possible.\n\n**note:** currently only nvidia gpus are supported, contributions for cpu/rocm/metal support are welcomed!\n## Installation\nThis package depends on [pytorch](https://pytorch.org/), [cupy](https://docs.cupy.dev/en/stable/install.html) and [sympy](https://docs.sympy.org/latest/install.html). Please make sure to have newer versions of these packages before installing sparse_dok.\n```bash\npip install sparse-dok\n```\n\n## Quick Start\n### Sparse DOK Tensor\n#### construction and conversion\nthere are various ways of creating a sparse DOK tensor:\n\n1. construct with indices and values (similar to `torch.sparse_coo_tensor`):\n```python\nindices = torch.arange(100, device=device)[None].expand(2, -1)\nvalues = torch.randn(100, device=device)\n\ndok_tensor = SparseDOKTensor(size=(100, 100), indices=indices, values=values)\n```\n\n2. create an empty tensor first, set items later:\n```python\nimport torch\nfrom sparse_dok import SparseDOKTensor\n\ndevice = \"cuda:0\"\ndok_tensor = SparseDOKTensor(size=(100, 100), dtype=torch.float, device=device)\n\nindices = torch.arange(100, device=device)\n\ndok_tensor[indices, indices] = 1 # now this is a sparse identity matrix!\n# assert torch.allclose(tensor.to_dense(), torch.eye(100, device=device))\n```\n3. convert from a dense tensor or sparse COO tensor:\n```python\ndok_tensor = SparseDOKTensor.from_dense(dense_tensor)\ndok_tensor = SparseDOKTensor.from_sparse_coo(coo_tensor)\n```\n\nyou can also convert a sparse DOK tensor to dense or sparse COO tensor:\n```python\ndense_tensor = dok_tensor.to_dense()\ncoo_tensor = dok_tensor.to_sparse_coo()\n```\n\n#### pytorch functions\nsparse DOK tensors can be used in all pytorch functions that accept `torch.sparse_coo_tensor` as input, including some functions in `torch` and `torch.sparse`. In these cases, the sparse DOK tensor will be simply converted to `torch.sparse_coo_tensor` before entering the function.\n\n```python\ntorch.add(dok_tensor, another_dok_tensor) # returns sparse coo tensor\ntorch.sparse.sum(dok_tensor, dim=0)\ntorch.sparse.mm(dok_tensor, dense_tensor)\n...\n```\n\nSome `torch.Tensor` class methods are also implemented:\n\n#### indexing, slicing and mutating\nthese methods are currently supported:\n`select()`, `index_select()`, `__getitem__()`, `__setitem__()`, `transpose()`, `transpose_()`, `permute()`, `permute_()`, `T`, `t()`, `flatten()`, `reshape()`\n\n**note**: `flatten()` and `reshape()` creates a copy of the original tensor, and rehashes all the index-value pairs, which makes it time consuming.\n\n**note**: `transpose()`, `permute()`, `T` and `t()` return a view of the original tensor that shares the same storage.\n\n**note**: `__getitem__()` and `__setitem__()` supports advanced slicing/indexing. for example:\n```python\ndok_tensor = SparseDOKTensor(size=(10,), values=values, indices=indices)\n\n# indexing with integers\ndok_tensor[0]\n\n# indexing with lists/ndarrays/tensors of integers\ndok_tensor[ [3, 6, 9] ]\n# output shape: (3,)\n\ndok_tensor[ np.arange(5) ]\n# output shape: (5,)\n\ndok_tensor[ torch.randint(10, size=(2, 2) ) ] \n# output shape: (2, 2)\n\n# indexing with boolean mask\nmask = torch.arange(10) \u003c 5\ndok_tensor[ mask ]\n# output shape: (5,)\n\n# slicing\ndok_tensor[0:5]\n# output shape: (5,)\n\n# and any combination of the above\ndok_tensor = SparseDOKTensor(size=(13, 11, 7, 3), values=values, indices=indices)\ndok_tensor[5, :4, [3, 5, 7], torch.tensor([True, False, True])]\n# output shape: (4, 3, 2)\n```\n\n**note**: `__getitem__()` always returns a dense tensor, similarly `__setitem__()` needs either a scalar value, or a broadcastable dense tensor as input. Sometimes slicing a large tensor may result in out of memory, so use it with caution.\n\n#### some special operators\n`sum()`, `softmax()`, `log_softmax()`, `norm()`, `normalize()`, `normalize_()`\n\n**note**: `normalize()` is similar to `torch.nn.functional.normalize()`, and `normalize_()` is its inplace version.\n\n#### other methods\n`dtype`, `device`, `shape`, `ndim`, `sparsity`, `is_sparse`, `indices()`, `values()`, `_indices()`, `_values()`, `_nnz()`, `size()`, `clone()`, `resize()`,\n`to_sparse_coo()`, `to_sparse_csr()`, `to_dense()`, `resize()`, `is_coalesced()`,\n`abs()`, `sqrt()`, `ceil()`, `floor()`, `round()`\n\n**note**: currently `torch.abs(dok_tensor)` returns sparse COO tensor, while `dok_tensor.abs()` returns sparse DOK tensor, same goes for all other unary functions. this behavior may change in the future.\n\n### CUDA Hashmap\n`CudaClosedHashmap` is a simple hashmap with closed hashing and linear probing. Both keys and values can be arbitrary shaped tensors \\*. All keys must have the same shape and data type, same goes for all values. Get/set/remove operations are performed in batches, taking advantage of the GPUs parallel processing power, millions of operations can be performed within less than a fraction of a second. \n\n\\* the number of elements each key and value can have is limited.\n\n#### basic usage\n```python\nfrom sparse_dok import CudaClosedHashmap\n\nhashmap = CudaClosedHashmap()\n\nn_keys = 1_000_000\nkeys = torch.rand(n_keys, 3, device=\"cuda:0\")\nvalues = torch.randint(2**63-1, size=(n_keys, 5), device=\"cuda:0\", dtype=torch.long)\n\n### set items\nhashmap[keys] = values\n# or \nis_set = hashmap.set(keys, values)\n\n\n### get items\nresult, is_found = hashmap[keys]\n# or\nresult, is_found = hashmap.get(keys, fallback_value=0)\n\n### remove items\ndel hashmap[keys]\n# or \nis_removed = hashmap.remove(keys)\n```\n\n#### some details\n##### 1. storage\n`hashmap._keys` and `hashmap._values` are the tensors where the key-value pairs are stored, their shapes are `(num_buckets, *key_shape)` and `(num_buckets, *value_shape)`. for each unique key, a 64bit uuid is computed using a second hash function, and stored alongside with keys and values. the uuids are stored in`hashmap._uuid`, which has a shape of `(num_buckets, )`. \n`hashmap.keys()`, `hashmap.values()`, `hashmap.uuid()` filters out unoccupied buckets, and returns only the items that are stored by user.\n\n`hashmap.n_buckets` is the current capacity of the hashmap.\nthe number of key-value pairs stored in the hashmap can be obtained from `hashmap.n_elements` or `len(hashmap)`\n\n##### 2. automatic rehashing\nRehashing is triggered when the load factor (`n_elements / n_buckets`) of the hashmap reaches `rehash_threshold`. During a rehash, the capacity of the hashmap increases, and all the items will be rehashed with a different random hash function. The new number of buckets in the hashmap is equal to `n_elements * rehash_factor`. By default, `rehash_threshold = 0.75` and `rehash_factor = 2.0`.\n\nTo prevent frequent rehashing, you can set a higher initial capacity to the hashmap, or set `rehash_factor` to a higher value. increasing `rehash_threshold` is not recommended, because it may cause severe performance degradation.\n\n```python\nhashmap = CudaClosedHashmap(n_buckets=100000, rehash_factor=4.0)\n```\n\nYou can also manually rehash a hashmap to a desired capacity:\n```\nhashmap.rehash(n_buckets=200000)\n```\n\n##### 3. nondeterministic behavior\nTo reduce global memory access latency of the cuda kernels, one optimization in `CudaClosedHashmap` is using uuid of keys to check whether two keys are identical, rather than comparing them directly. uuids are computed using a second hash function, and can have values ranging from $0$ to $2^{63} - 1$. It's technically possible for two keys to have the same uuid, however the chances are pretty small, and even if that happen, as long as they don't end up in the same bucket (determined by the first hash function), it will not cause any problem. Only when two different keys happen to have the exact same hashcodes from two different hash functions, one of them will be misidentified as the other.\n\n##### \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdemoriarty%2Fdoksparse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdemoriarty%2Fdoksparse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdemoriarty%2Fdoksparse/lists"}