{"id":15043607,"url":"https://github.com/dstein64/aghasher","last_synced_at":"2025-08-02T15:35:22.277Z","repository":{"id":30748499,"uuid":"34304969","full_name":"dstein64/aghasher","owner":"dstein64","description":"An implementation of Anchor Graph Hashing (Liu et al. 2011) in Python.","archived":false,"fork":false,"pushed_at":"2025-01-01T18:58:14.000Z","size":27473,"stargazers_count":9,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-07-06T23:49:01.469Z","etag":null,"topics":["anchor-graph-hashing","hashing","locality-sensitive-hashing","machine-learning","numpy","python"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/aghasher/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dstein64.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-04-21T04:51:20.000Z","updated_at":"2025-01-01T18:58:17.000Z","dependencies_parsed_at":"2025-01-01T19:34:56.986Z","dependency_job_id":"a18fad92-d5c2-42d8-96b1-38aabd40e1c7","html_url":"https://github.com/dstein64/aghasher","commit_stats":{"total_commits":41,"total_committers":1,"mean_commits":41.0,"dds":0.0,"last_synced_commit":"fae8103fd2abc1a4212ea88fcf940627e8605117"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/dstein64/aghasher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dstein64%2Faghasher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dstein64%2Faghasher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dstein64%2Faghasher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dstein64%2Faghasher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dstein64","download_url":"https://codeload.github.com/dstein64/aghasher/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dstein64%2Faghasher/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268082357,"owners_count":24192992,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-31T02:00:08.723Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anchor-graph-hashing","hashing","locality-sensitive-hashing","machine-learning","numpy","python"],"created_at":"2024-09-24T20:49:20.048Z","updated_at":"2025-08-02T15:35:22.219Z","avatar_url":"https://github.com/dstein64.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://github.com/dstein64/aghasher/workflows/build/badge.svg)](https://github.com/dstein64/aghasher/actions)\n\naghasher\n========\n\nAn implementation of the Anchor Graph Hashing algorithm (AGH-1), presented in *Hashing with Graphs* (Liu et al. 2011).\n\nDependencies\n------------\n\n*aghasher* supports `python\u003e=3.6`, with numpy and scipy. These should be linked with a BLAS implementation\n(e.g., OpenBLAS, ATLAS, Intel MKL). Without being linked to BLAS, numpy/scipy will use a fallback that causes\nPyAnchorGraphHasher to run over 50x slower.\n\nInstallation\n------------\n\n[aghasher](https://pypi.python.org/pypi/aghasher) is available on PyPI, the Python Package Index.\n\n```sh\n$ pip install aghasher\n```\n\nHow To Use\n----------\n\nTo use aghasher, first import the *aghasher* module.\n\n    import aghasher\n    \n### Training a Model\n\nAn AnchorGraphHasher is constructed using the *train* method, which returns an AnchorGraphHasher and the hash bit\nembedding for the training data.\n\n    agh, H_train = aghasher.AnchorGraphHasher.train(X, anchors, num_bits, nn_anchors, sigma)\n\nAnchorGraphHasher.train takes 5 arguments:\n\n* **X** An *n-by-d* numpy.ndarray with training data. The rows correspond to *n* observations, and the columns\n  correspond to *d* dimensions.\n* **anchors** An *m-by-d* numpy.ndarray with anchors. *m* is the total number of anchors. Rows correspond to anchors,\n  and columns correspond to dimensions. The dimensionality of the anchors much match the dimensionality of the training\n  data.\n* **num_bits** (optional; defaults to 12) Number of hash bits for the embedding.\n* **nn_anchors** (optional; defaults to 2) Number of nearest anchors that are used for approximating the neighborhood\n  structure.\n* **sigma** (optional; defaults to *None*) sigma for the Gaussian radial basis function that is used to determine\n  similarity between points. When sigma is specified as *None*, the code will automatically set a value, depending on\n  the training data and anchors.\n\n### Hashing Data with an AnchorGraphHasher Model\n\nWith an AnchorGraphHasher object, which has variable name *agh* in the preceding and following examples, hashing\nout-of-sample data is done with the object's *hash* method.\n\n    agh.hash(X)\n    \nThe hash method takes one argument:\n\n* **X** An *n-by-d* numpy.ndarray with data. The rows correspond to *n* observations, and the columns correspond to *d*\ndimensions. The dimensionality of the data much match the dimensionality of the training data used to train the\nAnchorGraphHasher.\n\nSince Python does not have a native bit vector data structure, the hash method returns an *n-by-r* numpy.ndarray, where\n*n* is the number of observations in *data*, and *r* is the number of hash bits specified when the model was trained.\nThe elements of the returned array are boolean values that correspond to bits.\n\n### Testing an AnchorGraphHasher Model\n\nTesting is performed with the AnchorGraphHasher.test method.\n\n    precision = AnchorGraphHasher.test(H_train, H_test, y_train, y_test, radius)\n    \nAnchorGraphHasher.test takes 5 arguments:\n\n* **H_train** An *n-by-r* numpy.ndarray with the hash bit embedding corresponding to the training data. The rows\n  correspond to the *n* observations, and the columns correspond to the *r* hash bits.\n* **H_test** An *m-by-r* numpy.ndarray with the hash bit embedding corresponding to the testing data. The rows\n  correspond to the *m* observations, and the columns correspond to the *r* hash bits.\n* **y_train** An *n-by-1* numpy.ndarray with the ground truth labels for the training data.\n* **y_test** An *m-by-1* numpy.ndarray with the ground truth labels for the testing data.\n* **radius** (optional; defaults to 2) Hamming radius to use for calculating precision.\n\nTests\n-----\n\nTests are in [tests/](https://github.com/dstein64/aghasher/blob/master/tests).\n\n```sh\n# Run tests\n$ python3 -m unittest discover tests -v\n```\n\nDifferences from the Matlab Reference Implementation\n----------------------------------------------------\n\nThe code is structured differently than the Matlab reference implementation.\n\nThe Matlab code implements an additional hashing method, hierarchical hashing (referred to as 2-AGH), an extension of\n1-AGH that is not implemented here.\n\nThere is one functional difference relative to the Matlab code. If *sigma* is specified (as opposed to being\nauto-estimated), then for the same value of *sigma*, the Matlab and Python code will produce different results. They\nwill produce the same results when the Matlab *sigma* is sqrt(2) times bigger than the manually specified *sigma* in the\nPython code. This is because in the Gaussian RBF kernel, the Python code uses a 2 in the denominator of the exponent,\nand the Matlab code does not. A 2 was included in the denominator of the Python code, as that is the canonical way to\nuse an RBF kernel.\n\nLicense\n-------\n\n*aghasher* has an [MIT License](https://en.wikipedia.org/wiki/MIT_License).\n\nSee [LICENSE](LICENSE).\n\nReferences\n----------\n\nLiu, Wei, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. “Hashing with Graphs.” In Proceedings of the 28th\nInternational Conference on Machine Learning (ICML-11), edited by Lise Getoor and Tobias Scheffer, 1–8. ICML ’11. New\nYork, NY, USA: ACM.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdstein64%2Faghasher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdstein64%2Faghasher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdstein64%2Faghasher/lists"}