{"id":23281364,"url":"https://github.com/jpata/sparsedistance","last_synced_at":"2025-09-06T22:31:44.416Z","repository":{"id":151599646,"uuid":"306009785","full_name":"jpata/SparseDistance","owner":"jpata","description":"Generate sparse distance matrices with gradients in tensorflow efficiently","archived":false,"fork":false,"pushed_at":"2020-10-22T17:28:48.000Z","size":9353,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-08-21T13:45:22.924Z","etag":null,"topics":["graph","knn-graph-construction","lsh","machinelearning","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jpata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-10-21T11:59:14.000Z","updated_at":"2022-11-01T11:50:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"b1c2609c-a295-439c-8666-64f1db56f82d","html_url":"https://github.com/jpata/SparseDistance","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/jpata/SparseDistance","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpata%2FSparseDistance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpata%2FSparseDistance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpata%2FSparseDistance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpata%2FSparseDistance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jpata","download_url":"https://codeload.github.com/jpata/SparseDistance/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpata%2FSparseDistance/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273973464,"owners_count":25200575,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-06T02:00:13.247Z","response_time":2576,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph","knn-graph-construction","lsh","machinelearning","tensorflow"],"created_at":"2024-12-19T23:46:52.457Z","updated_at":"2025-09-06T22:31:44.402Z","avatar_url":"https://github.com/jpata.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"SparseDistance\n==============\n\n\n[![DOI](https://zenodo.org/badge/306009785.svg)](https://zenodo.org/badge/latestdoi/306009785)\n\n```bash\npython3 -m pip install git+https://github.com/jpata/SparseDistance.git@v0.1\n(or just copy the files from the repo to your project)\n```\n\nEfficiently generate sparse graph adjacency matrices using tensorflow, including gradient propagation and minibatches, for graph sizes up to 100k+ in subquadratic time.\n\nOn the following images, you see the input set on the left and the learned graph structure (edges) on the right for a toy clustering problem with approx. 5000 input elements per graph.\n\u003cp float=\"left\"\u003e\n  \u003cimg src=\"images/graph_noedge.png\" alt=\"Input set without edges\" width=\"300\"/\u003e\n  \u003cimg src=\"images/graph.png\" alt=\"Genetated graph with edges\" width=\"300\"/\u003e\n\u003c/p\u003e\n\nHere, we show the learned distance matrix on the left and the scaling of the training time on the right.\n\u003cp float=\"left\"\u003e\n  \u003cimg src=\"images/dm.png\" alt=\"Generated adjacency matrix\" width=\"300\"/\u003e\n  \u003cimg src=\"images/timing.png\" alt=\"Scaling of the complexity with input size\" width=\"300\"/\u003e\n\u003c/p\u003e\n\nHere's how it works:\n- *Input*: a set of elements with features, `shape=(N_batch, N_elem, N_feat)`, possibly in minibatches for efficient training (e.g. a minibatch may consist of several sets/graphs padded to the same size)\n - *Output*: a sparse adjacency matrix for each input set `shape=(N_batch, N_elem, N_elem)`, the elements of which can be differentiated with respect to the input\n - *Hyperparameters*: bin size M, number of neighbors K, LSH codebook size (maximum number of bins) L\n\nThe input data is divided into equal-sized bins with a locality sensitive hashing (LSH) which is based on random rotations. In each bin, we run a dense k-nearest-neighbors algo and update the final sparse adjacency matrix. The generated graph consists of `N_elem/bin_size` disjoint graphs.\nThe maximum input size is determined by the pre-generated LSH codebook size. Since the bin size is much smaller than the input size, the k-nearest-neighbors evaluation is efficient.\nThe input features to the hashing and knn can be learnable, so that the binning \u0026 knn graph construction can adapt to the problem based on gradient descent.\n\n```python\nimport tensorflow as tf\nimport numpy as np\nfrom sparsedistance.models import SparseHashedNNDistance\nfrom sparsedistance.utils import sparse_dense_matmult_batch\n\nnum_batches = 10\nnum_points_per_batch = 1000\nnum_features = 32\n\nX = np.array(np.random.randn(num_batches, num_points_per_batch, num_features), dtype=np.float32)\ny = np.array(np.random.randn(num_batches, num_points_per_batch, ), dtype=np.float32)\n\n#show that we can take a gradient of stuff with respect to the distance matrix values (but not indices!)\ndense_transform = tf.keras.layers.Dense(128)\ndm_layer = SparseHashedNNDistance(max_num_bins=200, bin_size=500, num_neighbors=5)\n\nwith tf.GradientTape(persistent=True) as g:\n    X_transformed = dense_transform(X)\n    dm = dm_layer(X_transformed)\n\n    ret = sparse_dense_matmult_batch(dm, X)\n\n    #reduce the output to a single scalar, just for demonstration purposes\n    ret = tf.reduce_sum(ret)\n\ngrad = g.gradient(ret, dense_transform.weights)\n```\n\nFeatures:\n - [x] Works on a modest GPU (e.g. 2060S) or a CPU\n - [x] Uses only native TF 2.x operations, no compilation needed\n - [x] Fast evaluation and efficient memory use\n - [x] TF graph mode for easy deployment\n - [x] TF eager mode for debugging\n\nBased on the Reformer [1] (LSH approach and description) and GravNet [2] (knn graph construction) papers.\n\n - [1] https://arxiv.org/abs/2001.04451\n - [2] https://arxiv.org/abs/1902.07987\n\nIf you use this code academically, please cite this repository as follows:\n\n - Joosep Pata. (2020, October 22). jpata/SparseDistance v0.1 (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.4117570\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjpata%2Fsparsedistance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjpata%2Fsparsedistance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjpata%2Fsparsedistance/lists"}