https://github.com/aktersnurra/rag
Retrieval Augmented Generation
https://github.com/aktersnurra/rag
cohere ollama qdrant-vector-database retrieval-augmented-generation
Last synced: 4 months ago
JSON representation
Retrieval Augmented Generation
- Host: GitHub
- URL: https://github.com/aktersnurra/rag
- Owner: aktersnurra
- Created: 2024-04-03T20:03:14.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-04-13T11:55:16.000Z (almost 2 years ago)
- Last Synced: 2024-04-14T00:44:25.751Z (almost 2 years ago)
- Topics: cohere, ollama, qdrant-vector-database, retrieval-augmented-generation
- Language: Python
- Homepage:
- Size: 687 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Retrieval Augmented Generation
RAG with ollama (and optionally cohere) and qdrant. This is basically a glorified
(bloated) `grep`.
## Usage
### Setup
#### 1. Environment Variables
Create a .env file or set the following parameters:
```.env
CHUNK_SIZE=4096
CHUNK_OVERLAP=256
ENCODER_MODEL=nomic-embed-text
EMBEDDING_DIM=768
RETRIEVER_TOP_K=15
RETRIEVER_SCORE_THRESHOLD=0.5
RERANK_MODEL=mixedbread-ai/mxbai-rerank-large-v1
RERANK_TOP_K=5
GENERATOR_MODEL=llama3
DOCUMENT_DB_NAME=rag
DOCUMENT_DB_USER=aktersnurra
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION_NAME=knowledge-base
COHERE_API_KEY = # OPTIONAL
COHERE_RERANK_MODEL = "rerank-english-v3.0"
```
#### 2. Install Python Dependencies
```
poetry install
```
#### 3. Ollama
Make sure ollama is running:
```sh
ollama serve
```
Download the encoder and generator models with ollama:
```sh
ollama pull $GENERATOR_MODEL
ollama pull $ENCODER_MODEL
```
#### 4. Qdrant
Qdrant is used to store the embeddings of the chunks from the documents.
Download and run qdrant.
#### 5. Postgres
Postgres is used to save hashes of the document to prevent documents from
being added to the vector db more than ones.
Download and run qdrant.
#### 6. Cohere
Get an API from their website, but is optional.
### Running
Activate the poetry shell:
```sh
poetry shell
```
Use the cli:
```sh
python rag/cli.py
```
or the ui using a browser:
```sh
streamlit run rag/ui.py
```
### Notes
Yes, it is inefficient/dumb to use ollama when you can just load the models with python
in the same process.
### TODO
- [x] Rerank history if it is relevant.
- [x] message ollama/cohere
- [x] create db script
- [x] write a general model for cli/ui
- [ ] ~~use huggingface instead of ollama~~
- Huggingface is too slow...
- [ ] Refactor messages
- [ ] Rewrite in functional style
- [ ] Try out nemotron-mini
- [ ] Try out llama3-chatqa
### Inspiration
I took some inspiration from these tutorials:
[rag-openai-qdrant](https://colab.research.google.com/github/qdrant/examples/blob/master/rag-openai-qdrant/rag-openai-qdrant.ipynb)
[building-rag-application-using-langchain-openai-faiss](https://medium.com/@solidokishore/building-rag-application-using-langchain-openai-faiss-3b2af23d98ba)
[knowledge_gpt](https://github.com/mmz-001/knowledge_gpt)