{"id":13568266,"url":"https://github.com/microsoft/DiskANN","last_synced_at":"2025-04-04T04:31:02.415Z","repository":{"id":37567901,"uuid":"273157574","full_name":"microsoft/DiskANN","owner":"microsoft","description":"Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search","archived":false,"fork":false,"pushed_at":"2025-03-28T15:46:18.000Z","size":154777,"stargazers_count":1289,"open_issues_count":199,"forks_count":267,"subscribers_count":32,"default_branch":"main","last_synced_at":"2025-03-30T21:38:01.066Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-18T06:18:06.000Z","updated_at":"2025-03-30T21:16:30.000Z","dependencies_parsed_at":"2023-09-29T03:23:09.222Z","dependency_job_id":"5347b83b-0db7-4637-9b92-5316b0d6e0eb","html_url":"https://github.com/microsoft/DiskANN","commit_stats":{"total_commits":240,"total_committers":40,"mean_commits":6.0,"dds":0.7958333333333334,"last_synced_commit":"38cf26d88e50ebe64b9c932f8a657ba907ac3fe9"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FDiskANN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FDiskANN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FDiskANN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FDiskANN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/DiskANN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247123072,"owners_count":20887259,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T14:00:22.364Z","updated_at":"2025-04-04T04:30:57.408Z","avatar_url":"https://github.com/microsoft.png","language":"C++","funding_links":[],"categories":["Concepts \u0026 Definitions","C++","5. Retrieval-Augmented Generation (RAG) \u0026 Knowledge","Data Processing \u0026 Memory","Rust"],"sub_categories":[],"readme":"# DiskANN\n\n[![DiskANN Main](https://github.com/microsoft/DiskANN/actions/workflows/push-test.yml/badge.svg?branch=main)](https://github.com/microsoft/DiskANN/actions/workflows/push-test.yml)\n[![PyPI version](https://img.shields.io/pypi/v/diskannpy.svg)](https://pypi.org/project/diskannpy/)\n[![Downloads shield](https://pepy.tech/badge/diskannpy)](https://pepy.tech/project/diskannpy)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n[![DiskANN Paper](https://img.shields.io/badge/Paper-NeurIPS%3A_DiskANN-blue)](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf)\n[![DiskANN Paper](https://img.shields.io/badge/Paper-Arxiv%3A_Fresh--DiskANN-blue)](https://arxiv.org/abs/2105.09613)\n[![DiskANN Paper](https://img.shields.io/badge/Paper-Filtered--DiskANN-blue)](https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf)\n\n\nDiskANN is a suite of scalable, accurate and cost-effective approximate nearest neighbor search algorithms for large-scale vector search that support real-time changes and simple filters.\nThis code is based on ideas from the [DiskANN](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf), [Fresh-DiskANN](https://arxiv.org/abs/2105.09613) and the [Filtered-DiskANN](https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf) papers with further improvements. \nThis code forked off from [code for NSG](https://github.com/ZJULearning/nsg) algorithm.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\nSee [guidelines](CONTRIBUTING.md) for contributing to this project.\n\n## Linux build:\n\nInstall the following packages through apt-get\n\n```bash\nsudo apt install make cmake g++ libaio-dev libgoogle-perftools-dev clang-format libboost-all-dev\n```\n\n### Install Intel MKL\n#### Ubuntu 20.04 or newer\n```bash\nsudo apt install libmkl-full-dev\n```\n\n#### Earlier versions of Ubuntu\nInstall Intel MKL either by downloading the [oneAPI MKL installer](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html) or using [apt](https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-apt-repo) (we tested with build 2019.4-070 and 2022.1.2.146).\n\n```\n# OneAPI MKL Installer\nwget https://registrationcenter-download.intel.com/akdlm/irc_nas/18487/l_BaseKit_p_2022.1.2.146.sh\nsudo sh l_BaseKit_p_2022.1.2.146.sh -a --components intel.oneapi.lin.mkl.devel --action install --eula accept -s\n```\n\n### Build\n```bash\nmkdir build \u0026\u0026 cd build \u0026\u0026 cmake -DCMAKE_BUILD_TYPE=Release .. \u0026\u0026 make -j \n```\n\n## Windows build:\n\nThe Windows version has been tested with Enterprise editions of Visual Studio 2022, 2019 and 2017. It should work with the Community and Professional editions as well without any changes. \n\n**Prerequisites:**\n\n* CMake 3.15+ (available in VisualStudio 2019+ or from https://cmake.org)\n* NuGet.exe (install from https://www.nuget.org/downloads)\n    * The build script will use NuGet to get MKL, OpenMP and Boost packages.\n* DiskANN git repository checked out together with submodules. To check out submodules after git clone:\n```\ngit submodule init\ngit submodule update\n```\n\n* Environment variables: \n    * [optional] If you would like to override the Boost library listed in windows/packages.config.in, set BOOST_ROOT to your Boost folder.\n\n**Build steps:**\n* Open the \"x64 Native Tools Command Prompt for VS 2019\" (or corresponding version) and change to DiskANN folder\n* Create a \"build\" directory inside it\n* Change to the \"build\" directory and run\n```\ncmake ..\n```\nOR for Visual Studio 2017 and earlier:\n```\n\u003cfull-path-to-installed-cmake\u003e\\cmake ..\n```\n**This will create a diskann.sln solution**. Now you can:\n\n- Open it from VisualStudio and build either Release or Debug configuration.\n- `\u003cfull-path-to-installed-cmake\u003e\\cmake --build build`\n- Use MSBuild:\n```\nmsbuild.exe diskann.sln /m /nologo /t:Build /p:Configuration=\"Release\" /property:Platform=\"x64\"\n```\n\n* This will also build gperftools submodule for libtcmalloc_minimal dependency.\n* Generated binaries are stored in the x64/Release or x64/Debug directories.\n\n## Usage:\n\nPlease see the following pages on using the compiled code:\n\n- [Commandline interface for building and search SSD based indices](workflows/SSD_index.md)  \n- [Commandline interface for building and search in memory indices](workflows/in_memory_index.md) \n- [Commandline examples for using in-memory streaming indices](workflows/dynamic_index.md)\n- [Commandline interface for building and search in memory indices with label data and filters](workflows/filtered_in_memory.md)\n- [Commandline interface for building and search SSD based indices with label data and filters](workflows/filtered_ssd_index.md)\n- [diskannpy - DiskANN as a python extension module](python/README.md)\n\nPlease cite this software in your work as:\n\n```\n@misc{diskann-github,\n   author = {Simhadri, Harsha Vardhan and Krishnaswamy, Ravishankar and Srinivasa, Gopal and Subramanya, Suhas Jayaram and Antonijevic, Andrija and Pryce, Dax and Kaczynski, David and Williams, Shane and Gollapudi, Siddarth and Sivashankar, Varun and Karia, Neel and Singh, Aditi and Jaiswal, Shikhar and Mahapatro, Neelam and Adams, Philip and Tower, Bryan and Patel, Yash}},\n   title = {{DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search}},\n   url = {https://github.com/Microsoft/DiskANN},\n   version = {0.6.1},\n   year = {2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FDiskANN","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2FDiskANN","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FDiskANN/lists"}