Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dstein64/aghasher
An implementation of Anchor Graph Hashing (Liu et al. 2011) in Python.
https://github.com/dstein64/aghasher
anchor-graph-hashing hashing locality-sensitive-hashing machine-learning numpy python
Last synced: 7 days ago
JSON representation
An implementation of Anchor Graph Hashing (Liu et al. 2011) in Python.
- Host: GitHub
- URL: https://github.com/dstein64/aghasher
- Owner: dstein64
- License: mit
- Created: 2015-04-21T04:51:20.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2024-05-12T22:23:21.000Z (6 months ago)
- Last Synced: 2024-11-01T07:41:07.189Z (14 days ago)
- Topics: anchor-graph-hashing, hashing, locality-sensitive-hashing, machine-learning, numpy, python
- Language: Python
- Homepage: https://pypi.org/project/aghasher/
- Size: 26.2 MB
- Stars: 9
- Watchers: 4
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Build Status](https://github.com/dstein64/aghasher/workflows/build/badge.svg)](https://github.com/dstein64/aghasher/actions)
aghasher
========An implementation of the Anchor Graph Hashing algorithm (AGH-1), presented in *Hashing with Graphs* (Liu et al. 2011).
Dependencies
------------*aghasher* supports `python>=3.6`, with numpy and scipy. These should be linked with a BLAS implementation
(e.g., OpenBLAS, ATLAS, Intel MKL). Without being linked to BLAS, numpy/scipy will use a fallback that causes
PyAnchorGraphHasher to run over 50x slower.Installation
------------[aghasher](https://pypi.python.org/pypi/aghasher) is available on PyPI, the Python Package Index.
```sh
$ pip install aghasher
```How To Use
----------To use aghasher, first import the *aghasher* module.
import aghasher
### Training a ModelAn AnchorGraphHasher is constructed using the *train* method, which returns an AnchorGraphHasher and the hash bit
embedding for the training data.agh, H_train = aghasher.AnchorGraphHasher.train(X, anchors, num_bits, nn_anchors, sigma)
AnchorGraphHasher.train takes 5 arguments:
* **X** An *n-by-d* numpy.ndarray with training data. The rows correspond to *n* observations, and the columns
correspond to *d* dimensions.
* **anchors** An *m-by-d* numpy.ndarray with anchors. *m* is the total number of anchors. Rows correspond to anchors,
and columns correspond to dimensions. The dimensionality of the anchors much match the dimensionality of the training
data.
* **num_bits** (optional; defaults to 12) Number of hash bits for the embedding.
* **nn_anchors** (optional; defaults to 2) Number of nearest anchors that are used for approximating the neighborhood
structure.
* **sigma** (optional; defaults to *None*) sigma for the Gaussian radial basis function that is used to determine
similarity between points. When sigma is specified as *None*, the code will automatically set a value, depending on
the training data and anchors.### Hashing Data with an AnchorGraphHasher Model
With an AnchorGraphHasher object, which has variable name *agh* in the preceding and following examples, hashing
out-of-sample data is done with the object's *hash* method.agh.hash(X)
The hash method takes one argument:* **X** An *n-by-d* numpy.ndarray with data. The rows correspond to *n* observations, and the columns correspond to *d*
dimensions. The dimensionality of the data much match the dimensionality of the training data used to train the
AnchorGraphHasher.Since Python does not have a native bit vector data structure, the hash method returns an *n-by-r* numpy.ndarray, where
*n* is the number of observations in *data*, and *r* is the number of hash bits specified when the model was trained.
The elements of the returned array are boolean values that correspond to bits.### Testing an AnchorGraphHasher Model
Testing is performed with the AnchorGraphHasher.test method.
precision = AnchorGraphHasher.test(H_train, H_test, y_train, y_test, radius)
AnchorGraphHasher.test takes 5 arguments:* **H_train** An *n-by-r* numpy.ndarray with the hash bit embedding corresponding to the training data. The rows
correspond to the *n* observations, and the columns correspond to the *r* hash bits.
* **H_test** An *m-by-r* numpy.ndarray with the hash bit embedding corresponding to the testing data. The rows
correspond to the *m* observations, and the columns correspond to the *r* hash bits.
* **y_train** An *n-by-1* numpy.ndarray with the ground truth labels for the training data.
* **y_test** An *m-by-1* numpy.ndarray with the ground truth labels for the testing data.
* **radius** (optional; defaults to 2) Hamming radius to use for calculating precision.Tests
-----Tests are in [tests/](https://github.com/dstein64/aghasher/blob/master/tests).
```sh
# Run tests
$ python3 -m unittest discover tests -v
```Differences from the Matlab Reference Implementation
----------------------------------------------------The code is structured differently than the Matlab reference implementation.
The Matlab code implements an additional hashing method, hierarchical hashing (referred to as 2-AGH), an extension of
1-AGH that is not implemented here.There is one functional difference relative to the Matlab code. If *sigma* is specified (as opposed to being
auto-estimated), then for the same value of *sigma*, the Matlab and Python code will produce different results. They
will produce the same results when the Matlab *sigma* is sqrt(2) times bigger than the manually specified *sigma* in the
Python code. This is because in the Gaussian RBF kernel, the Python code uses a 2 in the denominator of the exponent,
and the Matlab code does not. A 2 was included in the denominator of the Python code, as that is the canonical way to
use an RBF kernel.License
-------*aghasher* has an [MIT License](https://en.wikipedia.org/wiki/MIT_License).
See [LICENSE](LICENSE).
References
----------Liu, Wei, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. “Hashing with Graphs.” In Proceedings of the 28th
International Conference on Machine Learning (ICML-11), edited by Lise Getoor and Tobias Scheffer, 1–8. ICML ’11. New
York, NY, USA: ACM.