Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/patw/ragtag

A tool for testing RAG functionality with Atlas Vector Search
https://github.com/patw/ragtag

atlas mongodb vector

Last synced: 29 days ago
JSON representation

A tool for testing RAG functionality with Atlas Vector Search

Host: GitHub
URL: https://github.com/patw/ragtag
Owner: patw
License: bsd-2-clause
Created: 2023-10-20T12:27:26.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2024-10-24T13:41:37.000Z (2 months ago)
Last Synced: 2024-10-25T15:44:24.974Z (2 months ago)
Topics: atlas, mongodb, vector
Language: Python
Homepage:
Size: 188 KB
Stars: 7
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# RAGTAG

A tool for manual RAG chunk entry for question/answer systems. Create, search or edit text chunks paired
up with questions to ensure good retrieval for embeddings.

Takes advantage of Atlas Mongo and Atlas Vector Search

RAGTAG allows you to:

* Create Q/A chunks for use with your chatbots
* Vectorize chunks with open source embedding models (Instructor-large)
* Search existing chunks, edit chunk, update embedding and update chatbot in real time
* Test your chunks for recall by using real questions

![RAGTAG UI Screenshot](images/ragtag_ui.png)

## Installation

```pip install -r requirements.txt```

## Downloading the Mistral 7b model (with dolphin fine tune)

```wget https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF/resolve/main/dolphin-2.1-mistral-7b.Q5_K_S.gguf```

## Running App

Copy sample.env to .env and modify with connection string to your Atlas instance

```flask run```

**WARNING: You will need about 20 gigs of ram to run this process! Mistral-7b requires 14 gig with the Q5 quantization, and instructor needs 4 gig on it's own**

## Atlas Search Index

Create and Atlas Search index, in the Atlas UI under the Search tab for the "chunks" collection
under the "ragtag" database.

```
{
"analyzer": "lucene.english",
"searchAnalyzer": "lucene.english",
"mappings": {
"dynamic": false,
"fields": {
"chunk_answer": {
"type": "string"
},
"chunk_embedding": {
"dimensions": 768,
"similarity": "cosine",
"type": "knnVector"
},
"chunk_enabled": {
"type": "boolean"
},
"chunk_question": {
"type": "string"
}
}
}
}
```