https://github.com/werediver/qas
A retrieval-augmented question answering system
https://github.com/werediver/qas
Last synced: about 1 year ago
JSON representation
A retrieval-augmented question answering system
- Host: GitHub
- URL: https://github.com/werediver/qas
- Owner: werediver
- License: mit
- Created: 2023-12-21T19:02:46.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-18T16:18:49.000Z (over 2 years ago)
- Last Synced: 2024-10-19T08:13:37.853Z (over 1 year ago)
- Language: Python
- Size: 92.8 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# A retrieval-augmented question-answering system
The goal of this project is to implement a retrieval-augmented (RAG) question-answering system that uses local documents or a remote wiki-like system (e.g. Confluence) to augment the user requests with relevant context before passing them to an LLM.
The current implementation relies on [Ollama](https://github.com/jmorganca/ollama) for text generation and [fastembed](https://github.com/qdrant/fastembed) / [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) for text embedding.
Using an OpenAI API-like provider instead of Ollama (e.g. [LM Studio](https://lmstudio.ai/)) is easily possible.
## How to run
You'll need [PDM](https://github.com/pdm-project/pdm) and Ollama installed in your system.
First, install the project dependencies by executing the following command in the project directory:
```
pdm install
```
Then make sure Ollama has Mistral model downloaded by executing the following command:
```
ollama pull mistral
```
Make sure Ollama server is running by executing the following command or in any other way:
```
ollama serve
```
Finally, to run the app execute the following command:
```
env DOCS="path/to/txt/or/md/docs" pdm run src/app.py
```
## How to load Confluence pages
`confluence_md` package can be used as a CLI tool to download Confluence pages as Markdown files with metadata in stored YAML front matter.
Make sure to set the following environment variables put them in `.env` file in the project root:
- `URL`, Confluence server URL
- `CLIENT_ID`, Confluence user name
- `ACCESS_TOKEN`, personal access token
- `DUMP_DIR`, directory to write Markdown files to
To start downloading, execute the following command:
```
date; time pdm run python -m src.confluence_md
```