https://github.com/hariprasath-v/secure_offline_rag_system

Developing a retrieval system using semantic vector search.
https://github.com/hariprasath-v/secure_offline_rag_system

crossencoder faiss python sentence-transformers trustii-team

Last synced: about 1 month ago
JSON representation

Developing a retrieval system using semantic vector search.

Host: GitHub
URL: https://github.com/hariprasath-v/secure_offline_rag_system
Owner: hariprasath-v
License: mit
Created: 2024-11-13T08:45:50.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-13T09:04:40.000Z (over 1 year ago)
Last Synced: 2025-03-02T13:50:36.809Z (over 1 year ago)
Topics: crossencoder, faiss, python, sentence-transformers, trustii-team
Language: Jupyter Notebook
Homepage:
Size: 20.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Secure_Offline_RAG_System

### Developing a retrieval system using semantic vector search.

### Competition hosted on [Trustii.io](https://app.trustii.io/datasets/1529)

## Objectives of this challenge
Build a Flexible Local RAG System: Develop a RAG system that generates embeddings using an open-source LLM, the system must support local execution without relying on external API calls. The system should be flexible and capable of handling various types of text data, including but not limited to Q&A datasets, websites, code snippets, documentation, and more.

Create a Versatile Local Chat Interface: Build a chat interface that interacts with the vector store generated from text embeddings and stored locally. This interface should allow users to query embeddings and retrieve relevant information to generate responses through locally executed LLM. The interface should support interaction with different content types, multiple languages specifically handling queries in English and French, code snippets, etc demonstrating the system's flexibility.

## My Approach
- Created embeddings using the sentence-transformers model: multi-qa-MiniLM-L6-cos-v1.
- Built a FAISS HNSW index with the embeddings for efficient vector search.
- Retrieved the top 10 results using dense embeddings, created query-response pairs from these results, and re-ranked them using a cross-encoder model ms-marco-MiniLM-L-12-v2.

## How to Run
Simply follow the steps in the provided Jupyter Notebook.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hariprasath-v/secure_offline_rag_system

Awesome Lists containing this project

README