Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aktersnurra/rag

Retrieval Augmented Generation
https://github.com/aktersnurra/rag

cohere ollama qdrant-vector-database retrieval-augmented-generation

Last synced: about 6 hours ago
JSON representation

Retrieval Augmented Generation

Host: GitHub
URL: https://github.com/aktersnurra/rag
Owner: aktersnurra
Created: 2024-04-03T20:03:14.000Z (9 months ago)
Default Branch: master
Last Pushed: 2024-04-13T11:55:16.000Z (8 months ago)
Last Synced: 2024-04-14T00:44:25.751Z (8 months ago)
Topics: cohere, ollama, qdrant-vector-database, retrieval-augmented-generation
Language: Python
Homepage:
Size: 687 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Retrieval Augmented Generation

RAG with ollama (and optionally cohere) and qdrant. This is basically a glorified
(bloated) `grep`.

## Usage

### Setup

#### 1. Environment Variables

Create a .env file or set the following parameters:

```.env
CHUNK_SIZE=4096
CHUNK_OVERLAP=256

ENCODER_MODEL=nomic-embed-text
EMBEDDING_DIM=768
RETRIEVER_TOP_K=15
RETRIEVER_SCORE_THRESHOLD=0.5

RERANK_MODEL=mixedbread-ai/mxbai-rerank-large-v1
RERANK_TOP_K=5

GENERATOR_MODEL=llama3

DOCUMENT_DB_NAME=rag
DOCUMENT_DB_USER=aktersnurra

QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION_NAME=knowledge-base

COHERE_API_KEY = # OPTIONAL
COHERE_RERANK_MODEL = "rerank-english-v3.0"
```

#### 2. Install Python Dependencies

```
poetry install
```

#### 3. Ollama

Make sure ollama is running:

```sh
ollama serve
```

Download the encoder and generator models with ollama:

```sh
ollama pull $GENERATOR_MODEL
ollama pull $ENCODER_MODEL
```

#### 4. Qdrant

Qdrant is used to store the embeddings of the chunks from the documents.

Download and run qdrant.

#### 5. Postgres

Postgres is used to save hashes of the document to prevent documents from
being added to the vector db more than ones.

Download and run qdrant.

#### 6. Cohere

Get an API from their website, but is optional.

### Running

Activate the poetry shell:

```sh
poetry shell
```

Use the cli:

```sh
python rag/cli.py
```

or the ui using a browser:

```sh
streamlit run rag/ui.py
```

### Notes

Yes, it is inefficient/dumb to use ollama when you can just load the models with python
in the same process.

### TODO

- [x] Rerank history if it is relevant.

- [x] message ollama/cohere

- [x] create db script

- [x] write a general model for cli/ui

- [ ] use huggingface instead of ollama

- [ ] Refactor messages

### Inspiration

I took some inspiration from these tutorials:

[rag-openai-qdrant](https://colab.research.google.com/github/qdrant/examples/blob/master/rag-openai-qdrant/rag-openai-qdrant.ipynb)

[building-rag-application-using-langchain-openai-faiss](https://medium.com/@solidokishore/building-rag-application-using-langchain-openai-faiss-3b2af23d98ba)
[knowledge_gpt](https://github.com/mmz-001/knowledge_gpt)