{"id":17748413,"url":"https://github.com/aaalgo/kgraph","last_synced_at":"2025-03-15T00:30:57.847Z","repository":{"id":32908623,"uuid":"36503514","full_name":"aaalgo/kgraph","owner":"aaalgo","description":"A library for k-nearest neighbor search","archived":false,"fork":false,"pushed_at":"2024-04-24T02:17:22.000Z","size":1177,"stargazers_count":356,"open_issues_count":7,"forks_count":85,"subscribers_count":23,"default_branch":"master","last_synced_at":"2024-04-24T04:45:43.953Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aaalgo.png","metadata":{"files":{"readme":"README.md","changelog":"Changes","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-05-29T12:38:24.000Z","updated_at":"2024-04-24T02:29:07.000Z","dependencies_parsed_at":"2024-04-24T03:38:57.550Z","dependency_job_id":null,"html_url":"https://github.com/aaalgo/kgraph","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaalgo%2Fkgraph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaalgo%2Fkgraph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaalgo%2Fkgraph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaalgo%2Fkgraph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aaalgo","download_url":"https://codeload.github.com/aaalgo/kgraph/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243667655,"owners_count":20328032,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-26T10:01:46.586Z","updated_at":"2025-03-15T00:30:57.414Z","avatar_url":"https://github.com/aaalgo.png","language":"C++","funding_links":[],"categories":["Open Sources","Multidimensional data / Vectors"],"sub_categories":[],"readme":"KGraph: A Library for Approximate Nearest Neighbor Search\n=========================================================\n\n# Introduction\n\nKGraph is a library for k-nearest neighbor (k-NN) graph construction and\nonline k-NN search using a k-NN Graph as index.  KGraph implements \nheuristic algorithms that are extremely generic and fast:\n* KGraph works on abstract objects.  The only assumption it makes is\nthat a similarity score can be computed on any pair of objects, with\na user-provided function.\n* KGraph is among the fastest of libraries for k-NN search according to [recent benchmark](https://github.com/erikbern/ann-benchmarks).\n\nFor best generality, the C++ API should be used.  A python wrapper\nis provided under the module name kgraph, which supports Euclidean\nand Angular distances on rows of NumPy matrices.\n\n\n!!!`pykgraph` has been renamed to `kgraph`.\n\n# Building and Installation\n\nKGraph depends on a recent version of GCC with C++11 support, cmake\nand the Boost library.  The package can be built and installed with\n```sh\ncmake -DCMAKE_BUILD_TYPE=release .\nmake\nsudo make install\n```\n\nA Makefile.plain is also provided in case cmake is not available.\n\nThe Python API can be installed with\n```\npython setup.py install\n```\n\n# Python Quick Start\n\n```python\nfrom numpy import random\nimport kgraph\n\ndataset = random.rand(1000000, 16)\nquery = random.rand(1000, 16)\n\nindex = kgraph.KGraph(dataset, 'euclidean')  # another option is 'angular'\nindex.build(reverse=-1)                        #\nindex.save(\"index_file\");\n# load with index.load(\"index_file\");\n\nknn = index.search(query, K=10)                       # this uses all CPU threads\nknn = index.search(query, K=10, threads=1)            # one thread, slower\nknn = index.search(query, K=1000, P=100)              # search for 1000-nn, no need to recompute index.\n```\n\nBoth index.build and index.search supports a number of optional keywords\narguments to fine tune the performance.  The default values should work\nreasonably well for many datasets.  One exception is that reverse=-1 should be\nadded if the purpose of building index is to speedup search, which is the\ntypical case, rather than to obtain the k-NN graph itself.\n\nTwo precautions should be taken:\n* Although matrices of both float32 and float64 are supported, the latter is not optimized.  It is recommened that\nmatrices be converted to float32 before being passed into kgraph.\n* The dimension (columns of matrices) should be a multiple of 4.  If not, zeros must be padded.\n\nFor performance considerations, the Python API does not support user-defined similarity function,\nas the callback function is invoked in such a high frequency that, if written in Python, speedup will\ninevitably be brought down.  For the full generality, the C++ API should be used.\n\n# C++ Quick Start\n\nThe KGraph C++ API is based on two central concepts: the index oracle and the search oracle.\n(Oracle is a fancy way of calling a user-defined callback function that behaves like a black box.)\nKGraph works solely with object IDs from 0 to N-1, and relies on the oracles to map the IDs to\nactual data objects and to compute the similarity. To use KGraph, the user has to extend the following\ntwo abstract classes\n\n```cpp\n    class IndexOracle {\n    public:\n        // returns size N of dataset\n        virtual unsigned size () const = 0;\n        // computes similarity of object 0 \u003c= i and j \u003c N\n        virtual float operator () (unsigned i, unsigned j) const = 0;\n    };\n\n    class SearchOracle {\n    public:\n        /// Returns the size N of the dataset.\n        virtual unsigned size () const = 0;\n\t/// Computes similarity of query and object 0 \u003c= i \u003c N.\n        virtual float operator () (unsigned i) const = 0;\n    };\n```\n\nThe similarity values computed by the oracles must satisfy the following two conditions:\n* The more similar the objects are, the smaller the similarity value (0.1 \u003c 10, -10 \u003c 1).\n* Similarity must be symmetric, i.e. f(a, b) = f(b, a).\n\nKGraph's heuristic algorithm does not make assumption about properties such as\ntriangle-inequality.  If the similarity is ill-defined, the worst it can do is to lower\nthe accuracy and to slow down computation.\n\nWith the oracle classes defined, index construction and online search become straightfoward:\n\n```cpp\n#include \u003ckgraph.h\u003e\n\nKGraph *index = KGraph::create();\n\nif (need_to_create_new_index) {\n    MyIndexOracle oracle(...);\t// subclass of kgraph::IndexOracle\n    KGraph::IndexParams params;  \n    params.reverse = -1;\n    index-\u003ebuild(oracle, params);\n    index-\u003esave(\"some_path\");\n}\nelse {\n    index-\u003eload(\"some_path\");\n}\n\nMySearchOracle oracle(...);\t// subclass of kgraph::SearchOracle\n\nKGraph::SearchParams params;\nparams.K = K;\nvector\u003cunsigned\u003e knn(K);    \t// to save K-NN ids.\nindex-\u003esearch(oracle, params, \u0026knn[0]);\n// knn now contains the IDs of k-NNs, highest similarity in the front\n\ndelete index;\n```\n\nNote that the search API does not directly imply nearest neighbor search.  Rather\nit is a generic API for minimizing a function on top of a graph, and finds the K\nnodes where the function assumes minimal values.\n\n# More Documentation\n### Oracles for Common Tasks\nKGraph provides a number of [efficient oracle implementation](doc/oracle.md) for\ncommon tasks. \n### [Parameter Tuning](doc/params.md)\n### [Doxygen Documentation](http://aaalgo.github.io/kgraph/doc/html/annotated.html)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaalgo%2Fkgraph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaalgo%2Fkgraph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaalgo%2Fkgraph/lists"}