https://github.com/logan-markewich/bm25-rs
Efficient BM25 indexing using rust
https://github.com/logan-markewich/bm25-rs
bm25 index indexing retrieval rust search
Last synced: about 2 months ago
JSON representation
Efficient BM25 indexing using rust
- Host: GitHub
- URL: https://github.com/logan-markewich/bm25-rs
- Owner: logan-markewich
- License: mit
- Created: 2024-09-11T03:01:38.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-17T02:48:10.000Z (about 1 year ago)
- Last Synced: 2025-04-03T19:01:36.650Z (7 months ago)
- Topics: bm25, index, indexing, retrieval, rust, search
- Language: Rust
- Homepage:
- Size: 15.6 KB
- Stars: 16
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bm25-rs
This project implements an efficient version of BM25 using Rust. It allows for insertion, upserts, deletion, and search.
It works by:
- Tracking document stats like lengths and term frequencies
- Utilizing an inverted index for quickly finding documents with specific terms
- Calculates BM25 TF and IDF at search time using a subset of relevant documents from the inverted index
- Keeps track of top-k in a binary heap for memory-efficient retrieval## Usage
```rust
use bm25_rs::Index;let mut index = Index::new();
// Insert document text + doc_id pairs
index.upsert("I like dogs", 0);
index.upsert("I like cats", 1);// Search with a query and a top-k
let results = index.search("like dogs", 2);// results are (score, doc_id) tuples
// This prints:
// > Doc ID: 0 has score 0.35018749494155993
// > Doc ID: 1 has score 0.07292862271758184
for result in results {
println!("Doc ID: {} has score {}", result.1, result.0);
}// Delete documents
index.delete(0)
```## TODO
- [ ] Add better/more tests
- [ ] Add some CICD
- [ ] Support metadata filtering
- [ ] Publish the package!
- [ ] Support launching as a server from the CLI
- [ ] Support creating collections/multiple indexes in server mode