{"id":13532014,"url":"https://github.com/vioshyvo/mrpt","last_synced_at":"2025-04-01T20:31:13.461Z","repository":{"id":68139735,"uuid":"52341062","full_name":"vioshyvo/mrpt","owner":"vioshyvo","description":"Fast and lightweight header-only C++ library (with Python bindings) for approximate nearest neighbor search","archived":false,"fork":false,"pushed_at":"2020-02-14T12:58:22.000Z","size":7687,"stargazers_count":257,"open_issues_count":4,"forks_count":47,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-10-07T00:02:10.868Z","etag":null,"topics":["approximate-nearest-neighbor-search","k-nn","knn-search","mrpt","nearest-neighbor-search","random-projection","similarity-search"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vioshyvo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-02-23T07:49:07.000Z","updated_at":"2024-08-28T11:13:45.000Z","dependencies_parsed_at":"2024-01-14T04:40:58.130Z","dependency_job_id":"bc725857-0e75-4854-a882-06749508c0e3","html_url":"https://github.com/vioshyvo/mrpt","commit_stats":null,"previous_names":["teemupitkanen/mrpt"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vioshyvo%2Fmrpt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vioshyvo%2Fmrpt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vioshyvo%2Fmrpt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vioshyvo%2Fmrpt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vioshyvo","download_url":"https://codeload.github.com/vioshyvo/mrpt/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246709923,"owners_count":20821297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-nearest-neighbor-search","k-nn","knn-search","mrpt","nearest-neighbor-search","random-projection","similarity-search"],"created_at":"2024-08-01T07:01:07.591Z","updated_at":"2025-04-01T20:31:08.446Z","avatar_url":"https://github.com/vioshyvo.png","language":"C++","funding_links":[],"categories":["Open Sources","Awesome Vector Search Engine","Multidimensional data / Vectors"],"sub_categories":["Library"],"readme":"# MRPT - fast nearest neighbor search with random projection\n\n![Fifty shades of green](docs/img/voting-candidates2.png)\n\n[![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](http://vioshyvo.github.io/mrpt/html/index.html)\n\nMRPT is a lightweight and easy-to-use library for approximate nearest neighbor search. It is written in C++11 and has Python bindings. The index building has an integrated hyperparameter tuning algorithm, so the only hyperparameter required to construct the index is the target recall level! \n\nAccording to [our experiments](https://github.com/ejaasaari/mrpt-comparison/) MRPT is one of the fastest libraries for approximate nearest neighbor search.\n\nIn the offline phase of the algorithm MRPT indexes the data with a collection of *random projection trees*. In the online phase the index structure allows us to answer queries in superior time. A detailed description of the algorithm with the time and space complexities, and the aforementioned comparisons can be found in [our article](https://www.cs.helsinki.fi/u/ttonteri/pub/bigdata2016.pdf) that was published in IEEE International Conference on Big Data 2016.\n\nThe algorithm for automatic hyperparameter tuning is described in detail in our new article that will be presented in Pacific-Asia Conference on Knowledge Discovery and Data Mining 2019 ([arxiv preprint](https://arxiv.org/abs/1812.07484)).\n\nCurrently the Euclidean distance is supported as a distance metric.\n\nThe tests for MRPT are in a separate [repo](https://github.com/vioshyvo/RP-test).\n## New\n\n- Release [MRPT 1.1.1](https://github.com/vioshyvo/mrpt/releases/tag/release-1.1.1) : faster autotuning and bug fixes. (2018/12/07)\n\n- Release [MRPT 1.1.0](https://github.com/vioshyvo/mrpt/releases/tag/release-1.1.0) : now autotuning works also without a separate set of test queries. (2018/11/24)\n\n- Release [MRPT 1.0.0](https://github.com/vioshyvo/mrpt/releases) (2018/11/22)\n\n- Add [documentation](http://vioshyvo.github.io/mrpt/html/index.html) for C++ API (2018/11/22)\n\n- Add index building with autotuning: no more manual hyperparameter tuning! (2018/11/21)\n\n## Python installation\n\nC++ compiler is needed for building python wrapper.\n\nOn MacOS, LLVM is needed for compiling: `brew install llvm libomp`.\n\nOn Windows, you may use MSVC compiler.\n\nInstall the module with `pip install git+https://github.com/vioshyvo/mrpt/`\n\n### Docker\n\nAn example docker file is provided, which builds MRPT python wrapper in Linux environment.\n\n```shell script\ndocker build -t mrpt .\ndocker run --rm -it mrpt\n``` \n\n## Minimal examples\n\n### Python\n\nThis example first generates a 200-dimensional data set of 10000 points, and 100 test query points. The `exact_search` function can be used to find the indices of the true 10 nearest neighbors of the first test query.\n\nThe `build_autotune_sample` function then builds an index for approximate k-nn search; it uses automatic parameter tuning, so only the target recall level (90% in this example) and the number of neighbors searched for have to be specified.\n\n```python\nimport mrpt\nimport numpy as np\n\nn, d, k = 10000, 200, 10\ntarget_recall = 0.9\n\ndata = np.random.rand(n, d).astype(np.float32)\nq = np.random.rand(d).astype(np.float32)\n\nindex = mrpt.MRPTIndex(data)\nprint(index.exact_search(q, k, return_distances=False))\n\nindex.build_autotune_sample(target_recall, k)\nprint(index.ann(q, return_distances=False))\n```\n\nThe approximate nearest neighbors are then searched by the function `ann`; because the index was autotuned, no other arguments than the query point are required.\n\nHere is a sample output:\n```\n[9738 5033 6520 2108 9216 9164  112 1442 1871 8020]\n[9738 5033 6520 2108 9216 9164  112 1442 1871 6789]\n```\n\n### C++\n\nMRPT is a header-only library, so no compilation is required: just include the header `cpp/Mrpt.h`. The only dependency is the Eigen linear algebra library (Eigen 3.3.5 is bundled in `cpp/lib`), so when using g++, the following minimal example can be compiled for example as:\n```\ng++ -std=c++11 -Ofast -march=native -Icpp -Icpp/lib ex1.cpp -o ex1 -fopenmp -lgomp\n```\n\nLet's first generate a 200-dimensional data set of 10000 points, and a query point (row = dimension, column = data point). Then `Mrpt::exact_knn` can be used to find the indices of the true 10 nearest neighbors of the test query.\n\nThe `grow_autotune` function builds an index for approximate k-nn search; it uses automatic parameter tuning, so only the target recall level (90% in this example), and the number of neighbors searched for have to be specified. This version automatically samples a test set of 100 query points from the data set to tune the parameters, so no separate test set is required.\n\n```c++\n#include \u003ciostream\u003e\n#include \"Eigen/Dense\"\n#include \"Mrpt.h\"\n\nint main() {\n  int n = 10000, d = 200, k = 10;\n  double target_recall = 0.9;\n  Eigen::MatrixXf X = Eigen::MatrixXf::Random(d, n);\n  Eigen::MatrixXf q = Eigen::VectorXf::Random(d);\n\n  Eigen::VectorXi indices(k), indices_exact(k);\n\n  Mrpt::exact_knn(q, X, k, indices_exact.data());\n  std::cout \u003c\u003c indices_exact.transpose() \u003c\u003c std::endl;\n\n  Mrpt mrpt(X);\n  mrpt.grow_autotune(target_recall, k);\n\n  mrpt.query(q, indices.data());\n  std::cout \u003c\u003c indices.transpose() \u003c\u003c std::endl;\n}\n```\n\nThe approximate nearest neighbors are then searched by the function `query`; because the index was autotuned, no other arguments than a query point and an output buffer for indices are required.\n\nHere is a sample output:\n```\n8108 1465 6963 2165   83 5900  662 8112 3592 5505\n8108 1465 6963 2165   83 5900 8112 3592 5505 7992\n```\nThe approximate nearest neighbor search found 9 of 10 true nearest neighbors; so this time the observed recall happened to match the expected recall exactly (results vary between the runs because the algorithm is randomized).\n\n## Citation\nAutomatic hyperparameter tuning:\n~~~~\n@inproceedings{Jaasaari2019,\n  title={Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search},\n  author={J{\\\"a}{\\\"a}saari, Elias and Hyv{\\\"o}nen, Ville and Roos, Teemu},\n  booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},\n  pages={In press},\n  year={2019},\n  organization={Springer}\n}\n~~~~\n\nMRPT algorithm:\n~~~~\n@inproceedings{Hyvonen2016,\n  title={Fast nearest neighbor search through sparse random projections and voting},\n  author={Hyv{\\\"o}nen, Ville and Pitk{\\\"a}nen, Teemu and Tasoulis, Sotiris and J{\\\"a}{\\\"a}saari, Elias and Tuomainen, Risto and Wang, Liang and Corander, Jukka and Roos, Teemu},\n  booktitle={Big Data (Big Data), 2016 IEEE International Conference on},\n  pages={881--888},\n  year={2016},\n  organization={IEEE}\n}\n~~~~\n\n## MRPT for other languages\n\n- [Go](https://github.com/rikonor/go-ann)\n\n## License\n\nMRPT is available under the MIT License (see [LICENSE.txt](LICENSE.txt)). Note that third-party libraries in the cpp/lib folder may be distributed under other open source licenses. The Eigen library is licensed under the MPL2.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvioshyvo%2Fmrpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvioshyvo%2Fmrpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvioshyvo%2Fmrpt/lists"}