https://github.com/analyticsinmotion/symrank

🐍📦 High-performance cosine similarity ranking for Retrieval-Augmented Generation (RAG) pipelines.
https://github.com/analyticsinmotion/symrank

cosine-similarity python-rust rag ranking-system reranking retrieval-augmented-generation

Last synced: about 1 year ago
JSON representation

🐍📦 High-performance cosine similarity ranking for Retrieval-Augmented Generation (RAG) pipelines.

Host: GitHub
URL: https://github.com/analyticsinmotion/symrank
Owner: analyticsinmotion
License: apache-2.0
Created: 2025-05-22T05:15:32.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-29T13:22:25.000Z (about 1 year ago)
Last Synced: 2025-05-29T14:12:59.292Z (about 1 year ago)
Topics: cosine-similarity, python-rust, rag, ranking-system, reranking, retrieval-augmented-generation
Language: Python
Homepage:
Size: 660 KB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          ![logo-symrank](https://github.com/user-attachments/assets/ce0b2224-d59a-4aab-a708-dcdc4968c54a)

Similarity ranking for Retrieval-Augmented Generation




  

    

      Meta

      

         

         

         

         

         

         

        

        

      

    

  



## ✨ What is SymRank?

**SymRank** is a blazing-fast Python library for top-k cosine similarity ranking, designed for vector search, retrieval-augmented generation (RAG), and embedding-based matching.

Built with a Rust + SIMD backend, it offers the speed of native code with the ease of Python.




## 🚀 Why SymRank?

⚡ Fast: SIMD-accelerated cosine scoring with adaptive parallelism

🧠 Smart: Automatically selects serial or parallel mode based on workload

🔢 Top-K optimized: Efficient inlined heap selection (no full sort overhead)

🐍 Pythonic: Easy-to-use Python API

🦀 Powered by Rust: Safe, high-performance core engine

📉 Memory Efficient: Supports batching for speed and to reduce memory footprint




## 📦 Installation

You can install SymRank with 'uv' or alternatively using 'pip'.

### Recommended (with uv):

```bash

uv pip install symrank

```

### Alternatively (using pip):

```bash

pip install symrank

```




## 🧪 Usage

### Basic Example (using python lists)

```python

import symrank as sr

query = [0.1, 0.2, 0.3, 0.4]  

candidates = [

    ("doc_1", [0.1, 0.2, 0.3, 0.5]),

    ("doc_2", [0.9, 0.1, 0.2, 0.1]),

    ("doc_3", [0.0, 0.0, 0.0, 1.0]),

]

results = sr.cosine_similarity(query, candidates, k=2)

print(results)

```

*Output*

```python

[{'id': 'doc_1', 'score': 0.9939991235733032}, {'id': 'doc_3', 'score': 0.7302967309951782}]

```

### Basic Example (using numpy arrays)

```python

import symrank as sr

import numpy as np

query = np.array([0.1, 0.2, 0.3, 0.4], dtype=np.float32)

candidates = [

    ("doc_1", np.array([0.1, 0.2, 0.3, 0.5], dtype=np.float32)),

    ("doc_2", np.array([0.9, 0.1, 0.2, 0.1], dtype=np.float32)),

    ("doc_3", np.array([0.0, 0.0, 0.0, 1.0], dtype=np.float32)),

]

results = sr.cosine_similarity(query, candidates, k=2)

print(results)

```

*Output*

```python

[{'id': 'doc_1', 'score': 0.9939991235733032}, {'id': 'doc_3', 'score': 0.7302967309951782}]

```




## 🧩 API: cosine_similarity(...)

```python

cosine_similarity(

    query_vector,              # List[float] or np.ndarray

    candidate_vectors,         # List[Tuple[str, List[float] or np.ndarray]]

    k=5,                       # Number of top results to return

    batch_size=None            # Optional: set for memory-efficient batching

)

```

### 'cosine_similarity(...)' Parameters

| Parameter         | Type                                               | Default     | Description |

|-------------------|----------------------------------------------------|-------------|-------------|

| `query_vector`     | `list[float]` or `np.ndarray`                       | _required_  | The query vector you want to compare against the candidate vectors. |

| `candidate_vectors`| `list[tuple[str, list[float] or np.ndarray]]`          | _required_  | List of `(id, vector)` pairs. Each vector can be a list or NumPy array. |

| `k`                | `int`                                               | 5         | Number of top results to return, sorted by descending similarity. |

| `batch_size`       | `int` or `None`                                       | None      | Optional batch size to reduce memory usage. If None, uses SIMD directly. |

### Returns

List of dictionaries with `id` and `score` (cosine similarity), sorted by descending similarity:

```python

[{"id": "doc_42", "score": 0.8763}, {"id": "doc_17", "score": 0.8451}, ...]

```




## 📄 License

This project is licensed under the Apache License 2.0.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/analyticsinmotion/symrank

Awesome Lists containing this project

README

Similarity ranking for Retrieval-Augmented Generation