https://github.com/jpata/sparsedistance
Generate sparse distance matrices with gradients in tensorflow efficiently
https://github.com/jpata/sparsedistance
graph knn-graph-construction lsh machinelearning tensorflow
Last synced: about 1 month ago
JSON representation
Generate sparse distance matrices with gradients in tensorflow efficiently
- Host: GitHub
- URL: https://github.com/jpata/sparsedistance
- Owner: jpata
- License: bsd-3-clause
- Created: 2020-10-21T11:59:14.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2020-10-22T17:28:48.000Z (almost 5 years ago)
- Last Synced: 2025-08-21T13:45:22.924Z (about 2 months ago)
- Topics: graph, knn-graph-construction, lsh, machinelearning, tensorflow
- Language: Python
- Homepage:
- Size: 8.92 MB
- Stars: 3
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
SparseDistance
==============[](https://zenodo.org/badge/latestdoi/306009785)
```bash
python3 -m pip install git+https://github.com/jpata/SparseDistance.git@v0.1
(or just copy the files from the repo to your project)
```Efficiently generate sparse graph adjacency matrices using tensorflow, including gradient propagation and minibatches, for graph sizes up to 100k+ in subquadratic time.
On the following images, you see the input set on the left and the learned graph structure (edges) on the right for a toy clustering problem with approx. 5000 input elements per graph.
![]()
![]()
Here, we show the learned distance matrix on the left and the scaling of the training time on the right.
![]()
![]()
Here's how it works:
- *Input*: a set of elements with features, `shape=(N_batch, N_elem, N_feat)`, possibly in minibatches for efficient training (e.g. a minibatch may consist of several sets/graphs padded to the same size)
- *Output*: a sparse adjacency matrix for each input set `shape=(N_batch, N_elem, N_elem)`, the elements of which can be differentiated with respect to the input
- *Hyperparameters*: bin size M, number of neighbors K, LSH codebook size (maximum number of bins) LThe input data is divided into equal-sized bins with a locality sensitive hashing (LSH) which is based on random rotations. In each bin, we run a dense k-nearest-neighbors algo and update the final sparse adjacency matrix. The generated graph consists of `N_elem/bin_size` disjoint graphs.
The maximum input size is determined by the pre-generated LSH codebook size. Since the bin size is much smaller than the input size, the k-nearest-neighbors evaluation is efficient.
The input features to the hashing and knn can be learnable, so that the binning & knn graph construction can adapt to the problem based on gradient descent.```python
import tensorflow as tf
import numpy as np
from sparsedistance.models import SparseHashedNNDistance
from sparsedistance.utils import sparse_dense_matmult_batchnum_batches = 10
num_points_per_batch = 1000
num_features = 32X = np.array(np.random.randn(num_batches, num_points_per_batch, num_features), dtype=np.float32)
y = np.array(np.random.randn(num_batches, num_points_per_batch, ), dtype=np.float32)#show that we can take a gradient of stuff with respect to the distance matrix values (but not indices!)
dense_transform = tf.keras.layers.Dense(128)
dm_layer = SparseHashedNNDistance(max_num_bins=200, bin_size=500, num_neighbors=5)with tf.GradientTape(persistent=True) as g:
X_transformed = dense_transform(X)
dm = dm_layer(X_transformed)ret = sparse_dense_matmult_batch(dm, X)
#reduce the output to a single scalar, just for demonstration purposes
ret = tf.reduce_sum(ret)grad = g.gradient(ret, dense_transform.weights)
```Features:
- [x] Works on a modest GPU (e.g. 2060S) or a CPU
- [x] Uses only native TF 2.x operations, no compilation needed
- [x] Fast evaluation and efficient memory use
- [x] TF graph mode for easy deployment
- [x] TF eager mode for debuggingBased on the Reformer [1] (LSH approach and description) and GravNet [2] (knn graph construction) papers.
- [1] https://arxiv.org/abs/2001.04451
- [2] https://arxiv.org/abs/1902.07987If you use this code academically, please cite this repository as follows:
- Joosep Pata. (2020, October 22). jpata/SparseDistance v0.1 (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.4117570