{"id":13525975,"url":"https://github.com/ICME-Lab/Vectune","last_synced_at":"2025-04-01T06:30:50.952Z","repository":{"id":230096152,"uuid":"776112095","full_name":"ClankPan/Vectune","owner":"ClankPan","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-10T21:18:21.000Z","size":160,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-11T03:05:10.538Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ClankPan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-22T17:50:49.000Z","updated_at":"2024-04-14T21:41:14.001Z","dependencies_parsed_at":"2024-04-14T21:41:05.469Z","dependency_job_id":"afc7f426-9e00-4941-a947-0aa0ff2c4f7c","html_url":"https://github.com/ClankPan/Vectune","commit_stats":null,"previous_names":["clankpan/vectune"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClankPan%2FVectune","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClankPan%2FVectune/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClankPan%2FVectune/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClankPan%2FVectune/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ClankPan","download_url":"https://codeload.github.com/ClankPan/Vectune/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222703734,"owners_count":17025838,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T06:01:23.993Z","updated_at":"2025-04-01T06:30:50.002Z","avatar_url":"https://github.com/ClankPan.png","language":"Rust","funding_links":[],"categories":["Decentralized AI"],"sub_categories":["Solana"],"readme":"# Vectune: fast Vamana indexing\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE-MIT)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE-APACHE)\n\n\nVectune is a lightweight VectorDB with Incremental Indexing, based on [FreshVamana](https://arxiv.org/pdf/2105.09613.pdf).\nThis project is implemented with the support of KinicDAO and powers the backend of [KinicVectorDB](https://xcvai-qiaaa-aaaak-afowq-cai.icp0.io/) for vector indexing.\n\n## Getting Start\n\nBy specifying progress-bar in features, you can check the progress of indexing.\n\n```toml\n[dependencies]\nvectune = {version = \"0.1.0\", features = [\"progress-bar\"]}\n```\n\nTo perform calculations of Euclidean distances quickly using SIMD, it is necessary to specify `nightly` in example. If the `rust-analyzer` in VSCode gives an error for `#![feature(portable_simd)]`, please set up your `.vscode/settings.json`.\n\n```json\n{\n  \"rust-analyzer.server.extraEnv\": {\n      \"RUSTUP_TOOLCHAIN\": \"nightly\"\n  },\n}\n```\n\n## Example\n\n### Setup and Run\n\nTo test with the SIFT1M dataset, please execute the following command. SIFT1M is a dataset of 1 million data points, each with 128 dimensions.\n\n```bash\ncurl ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz -o examples/test_data/sift.tar.gz\ntar -xzvf examples/test_data/sift.tar.gz -C examples/test_data\n\ncargo +nightly run --release --features progress-bar --example sift1m\n```\n\n### How it works\n\nIndexing is performed on the data using a Builder, and searches and insertions are conducted on the graph.\n\n```rust\nuse vectune::{Builder, GraphInterface, PointInterface};\n\nlet points = Vec::new();\nfor vec in base_vectors {\n    points.push(Point(vec.to_vec()));\n}\n\nlet (nodes, centroid) = Builder::default()\n    .progress(ProgressBar::new(1000))\n    .build(points);\n\nlet mut graph = Graph::new(nodes, centroid);\n\nlet k = 50;\n\nlet (top_k_results, _visited) = vectune::search(\u0026mut graph, \u0026Point(query.to_vec()), k);\n```\n\n### PointInterface Trait\n\nYou will need to define the dimensions and data type of the vectors used, as well as the method for calculating distance.\n\nPlease implement the following four methods:\n- `distance(\u0026self, other: \u0026Self) -\u003e f32`\n- `fn dim() -\u003e u32`\n- `fn add(\u0026self, other: \u0026Self) -\u003e Self`\n- `fn div(\u0026self, divisor: \u0026usize) -\u003e Self`\n\n`distance()` can be optimized using SIMD. Please refer to `./examples/src/bin/sift1m.rs`.\n\nThe following example provides a simple implementation.\n\n\n```rust\nuse vectune::PointInterface;\n\n#[derive(Serialize, Deserialize, Clone, Debug)]\nstruct Point(Vec\u003cf32\u003e);\nimpl Point {\n    fn to_f32_vec(\u0026self) -\u003e Vec\u003cf32\u003e {\n        self.0.iter().copied().collect()\n    }\n    fn from_f32_vec(a: Vec\u003cf32\u003e) -\u003e Self {\n        Point(a.into_iter().collect())\n    }\n}\nimpl PointInterface for Point {\n    fn distance(\u0026self, other: \u0026Self) -\u003e f32 {\n        self.0\n            .iter()\n            .zip(other.0.iter())\n            .map(|(a, b)| {\n                let c = a - b;\n                c * c\n            })\n            .sum::\u003cf32\u003e()\n            .sqrt()\n    }\n    fn dim() -\u003e u32 {\n        384\n    }\n    fn add(\u0026self, other: \u0026Self) -\u003e Self {\n        Point::from_f32_vec(\n            self.to_f32_vec()\n                .into_iter()\n                .zip(other.to_f32_vec().into_iter())\n                .map(|(x, y)| x + y)\n                .collect(),\n        )\n    }\n    fn div(\u0026self, divisor: \u0026usize) -\u003e Self {\n        Point::from_f32_vec(\n            self.to_f32_vec()\n                .into_iter()\n                .map(|v| v / *divisor as f32)\n                .collect(),\n        )\n    }\n}\n```\n\n\n### GraphInterface Trait\n\nTo accommodate the entire graph on storage solutions other than SSDs or other memory types, you need to implement the `GraphInterface`.\n\nPlease implement the following eleven methods:\n- `fn alloc(\u0026mut self, point: P) -\u003e usize`\n- `fn free(\u0026mut self, id: \u0026usize)`\n- `fn cemetery(\u0026self) -\u003e Vec\u003cusize\u003e`\n- `fn clear_cemetery(\u0026mut self)`\n- `fn backlink(\u0026self, id: \u0026usize) -\u003e Vec\u003cusize\u003e`\n- `fn get(\u0026mut self, id: \u0026usize) -\u003e (P, Vec\u003cusize\u003e)`\n- `fn size_l(\u0026self) -\u003e usize`\n- `fn size_r(\u0026self) -\u003e usize`\n- `fn size_a(\u0026self) -\u003e f32`\n- `fn start_id(\u0026self) -\u003e usize`\n- `fn overwirte_out_edges(\u0026mut self, id: \u0026usize, edges: Vec\u003cusize\u003e)`\n\n`self.get()` is defined with `\u0026mut self` because it handles caching from SSDs and other storage devices.\n\nIn `vectune::search()`, nodes returned by `self.cemetery()` are marked as tombstones and are excluded from the search results. Additionally, they are permanently deleted in `vectune::delete()`.\n\nYou need to manage backlinks when adding or deleting nodes. This is utilized in `vectune::delete()`.\n\nThe following example provides a simple on-memory implementation.\n\n\n```rust\nuse vectune::GraphInterface;\nuse itertools::Itertools;\n\nstruct Graph\u003cP\u003e\nwhere\n    P: VPoint,\n{\n    nodes: Vec\u003c(P, Vec\u003cu32\u003e)\u003e,\n    backlinks: Vec\u003cVec\u003cu32\u003e\u003e,\n    cemetery: Vec\u003cu32\u003e,\n    centroid: u32,\n}\n\nimpl\u003cP\u003e VGraph\u003cP\u003e for Graph\u003cP\u003e\nwhere\n    P: VPoint,\n{\n    fn alloc(\u0026mut self, point: P) -\u003e u32 {\n        self.nodes.push((point, vec![]));\n        self.backlinks.push(vec![]);\n        (self.nodes.len() - 1) as u32\n    }\n\n    fn free(\u0026mut self, _id: \u0026u32) {\n        // todo!()\n    }\n\n    fn cemetery(\u0026self) -\u003e Vec\u003cu32\u003e {\n        self.cemetery.clone()\n    }\n\n    fn clear_cemetery(\u0026mut self) {\n        self.cemetery = Vec::new();\n    }\n\n    fn backlink(\u0026self, id: \u0026u32) -\u003e Vec\u003cu32\u003e {\n        self.backlinks[*id as usize].clone()\n    }\n\n    fn get(\u0026mut self, id: \u0026u32) -\u003e (P, Vec\u003cu32\u003e) {\n        let node = \u0026self.nodes[*id as usize];\n        node.clone()\n    }\n\n    fn size_l(\u0026self) -\u003e usize {\n        125\n    }\n\n    fn size_r(\u0026self) -\u003e usize {\n        70\n    }\n\n    fn size_a(\u0026self) -\u003e f32 {\n        2.0\n    }\n\n    fn start_id(\u0026self) -\u003e u32 {\n        self.centroid\n    }\n\n    fn overwirte_out_edges(\u0026mut self, id: \u0026u32, edges: Vec\u003cu32\u003e) {\n        for out_i in \u0026self.nodes[*id as usize].1 {\n            let backlinks = \u0026mut self.backlink(out_i);\n            backlinks.retain(|out_i| out_i != id)\n        }\n\n        for out_i in \u0026edges {\n            let backlinks = \u0026mut self.backlink(out_i);\n            backlinks.push(*id);\n            backlinks.sort();\n            backlinks.dedup();\n        }\n\n        self.nodes[*id as usize].1 = edges;\n    }\n}\n\n```\n\n## Indexing\n\n- `a` is the threshold for RobustPrune; increasing it results in more long-distance edges and fewer nearby edges.\n- `r` represents the number of edges; increasing it adds complexity to the graph but reduces the number of isolated nodes.\n- `l` is the size of the retention list for greedy-search; increasing it allows for the construction of more accurate graphs, but the computational cost grows exponentially.\n- `seed` is used for initializing random graphs; it allows for the fixation of the random graph, which can be useful for debugging.\n\n```rust\nlet (nodes, centroid) = Builder::default()\n    .set_a(2.0)\n    .set_r(70)\n    .set_l(125)\n    .set_seed(11677721592066047712)\n    .progress(ProgressBar::new(1000))\n    .build(points);\n```\n\n## Searching\n\n`k` represents the number of top-k results. It is necessary that `k \u003c= l`.\n\n```rust\nvectune::search(\u0026mut graph, \u0026point, k);\n```\n\n## Inserting\n\n```rust\nvectune::insert(\u0026mut graph, point);\n```\n\n## Deleting\n\nCompletely remove the nodes returned by `graph.cemetery()` from the graph.\n\n```rust\nvectune::delete(\u0026mut graph);\n```\n\n## Ordering\n\nReordering the arrangement to efficiently reference nodes from storage such as SSDs.\nThis algorithm is proposed in Section 4 of this [paper](https://arxiv.org/pdf/2211.12850v2.pdf). \n\n```rust\nvectune::gorder(\n    edges,      // Vec\u003cVec\u003cu32\u003e\u003e\n    backlinks,  // Vec\u003cVec\u003cu32\u003e\u003e\n    10,         // Number of nodes in one section\n    \u0026mut rng,\n);\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FICME-Lab%2FVectune","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FICME-Lab%2FVectune","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FICME-Lab%2FVectune/lists"}