An open API service indexing awesome lists of open source software.

https://github.com/analyticsinmotion/symrank

๐Ÿ๐Ÿ“ฆ High-performance cosine similarity ranking for Retrieval-Augmented Generation (RAG) pipelines.
https://github.com/analyticsinmotion/symrank

cosine-similarity python-rust rag ranking-system reranking retrieval-augmented-generation

Last synced: 10 months ago
JSON representation

๐Ÿ๐Ÿ“ฆ High-performance cosine similarity ranking for Retrieval-Augmented Generation (RAG) pipelines.

Awesome Lists containing this project

README

          

![logo-symrank](https://github.com/user-attachments/assets/ce0b2224-d59a-4aab-a708-dcdc4968c54a)

Similarity ranking for Retrieval-Augmented Generation




Meta

ย 
ย 
ย 
uvย 
Ruffย 
Powered by Rustย 
Analytics in Motion




## โœจ What is SymRank?
**SymRank** is a blazing-fast Python library for top-k cosine similarity ranking, designed for vector search, retrieval-augmented generation (RAG), and embedding-based matching.

Built with a Rust + SIMD backend, it offers the speed of native code with the ease of Python.


## ๐Ÿš€ Why SymRank?

โšก Fast: SIMD-accelerated cosine scoring with adaptive parallelism

๐Ÿง  Smart: Automatically selects serial or parallel mode based on workload

๐Ÿ”ข Top-K optimized: Efficient inlined heap selection (no full sort overhead)

๐Ÿ Pythonic: Easy-to-use Python API

๐Ÿฆ€ Powered by Rust: Safe, high-performance core engine

๐Ÿ“‰ Memory Efficient: Supports batching for speed and to reduce memory footprint


## ๐Ÿ“ฆ Installation

You can install SymRank with 'uv' or alternatively using 'pip'.

### Recommended (with uv):
```bash
uv pip install symrank
```

### Alternatively (using pip):
```bash
pip install symrank
```


## ๐Ÿงช Usage

### Basic Example (using python lists)

```python
import symrank as sr

query = [0.1, 0.2, 0.3, 0.4]
candidates = [
("doc_1", [0.1, 0.2, 0.3, 0.5]),
("doc_2", [0.9, 0.1, 0.2, 0.1]),
("doc_3", [0.0, 0.0, 0.0, 1.0]),
]

results = sr.cosine_similarity(query, candidates, k=2)
print(results)
```

*Output*
```python
[{'id': 'doc_1', 'score': 0.9939991235733032}, {'id': 'doc_3', 'score': 0.7302967309951782}]
```

### Basic Example (using numpy arrays)

```python
import symrank as sr
import numpy as np

query = np.array([0.1, 0.2, 0.3, 0.4], dtype=np.float32)
candidates = [
("doc_1", np.array([0.1, 0.2, 0.3, 0.5], dtype=np.float32)),
("doc_2", np.array([0.9, 0.1, 0.2, 0.1], dtype=np.float32)),
("doc_3", np.array([0.0, 0.0, 0.0, 1.0], dtype=np.float32)),
]

results = sr.cosine_similarity(query, candidates, k=2)
print(results)
```

*Output*
```python
[{'id': 'doc_1', 'score': 0.9939991235733032}, {'id': 'doc_3', 'score': 0.7302967309951782}]
```


## ๐Ÿงฉ API: cosine_similarity(...)

```python
cosine_similarity(
query_vector, # List[float] or np.ndarray
candidate_vectors, # List[Tuple[str, List[float] or np.ndarray]]
k=5, # Number of top results to return
batch_size=None # Optional: set for memory-efficient batching
)
```

### 'cosine_similarity(...)' Parameters

| Parameter | Type | Default | Description |
|-------------------|----------------------------------------------------|-------------|-------------|
| `query_vector` | `list[float]` or `np.ndarray` | _required_ | The query vector you want to compare against the candidate vectors. |
| `candidate_vectors`| `list[tuple[str, list[float] or np.ndarray]]` | _required_ | List of `(id, vector)` pairs. Each vector can be a list or NumPy array. |
| `k` | `int` | 5 | Number of top results to return, sorted by descending similarity. |
| `batch_size` | `int` or `None` | None | Optional batch size to reduce memory usage. If None, uses SIMD directly. |

### Returns

List of dictionaries with `id` and `score` (cosine similarity), sorted by descending similarity:

```python
[{"id": "doc_42", "score": 0.8763}, {"id": "doc_17", "score": 0.8451}, ...]
```


## ๐Ÿ“„ License

This project is licensed under the Apache License 2.0.