https://github.com/bbc-esq/vectordb-plugin

Plugin that lets you ask questions about your documents including audio and video files.
https://github.com/bbc-esq/vectordb-plugin

bark database-management embedding-models embedding-vectors embeddings gtts koboldai koboldcpp python rag retrieval-augmented-generation retrieval-chatbot tiledb vector-data-management vector-database vector-search vision whisper whispers2t whisperspeech

Last synced: about 2 months ago
JSON representation

Plugin that lets you ask questions about your documents including audio and video files.

Host: GitHub
URL: https://github.com/bbc-esq/vectordb-plugin
Owner: BBC-Esq
Created: 2023-08-13T03:21:20.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-05-15T14:00:17.000Z (about 2 months ago)
Last Synced: 2025-05-16T09:02:08.922Z (about 2 months ago)
Topics: bark, database-management, embedding-models, embedding-vectors, embeddings, gtts, koboldai, koboldcpp, python, rag, retrieval-augmented-generation, retrieval-chatbot, tiledb, vector-data-management, vector-database, vector-search, vision, whisper, whispers2t, whisperspeech
Language: Python
Homepage: https://www.youtube.com/@AI_For_Lawyers
Size: 32.8 MB
Stars: 336
Watchers: 7
Forks: 42
Open Issues: 10
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

🚀 Supercharged Vector Database!

Requirements
•
Installation
•
Using the Program
•
Request a Feature or Report a Bug
•
Contact

Create and search a vector database to get a response from the large language model that's more accurate. This is commonly referred to as "retrieval augmented generation" (RAG)! You can watch an introductory [Video](https://www.youtube.com/watch?v=8-ZAYI4MvtA) or read a [Medium article](https://medium.com/@vici0549/search-images-with-vector-database-retrieval-augmented-generation-rag-3d5a48881de5) about the program.

Graphic of How This Program Works

![image](https://github.com/user-attachments/assets/b3784da7-91a5-426b-882c-756468ffdc20)

Requirements

| [🐍 Python 3.11](https://www.python.org/downloads/release/python-3119/) or [Python 3.12](https://www.python.org/downloads/release/python-3129/) • [📁 Git](https://git-scm.com/downloads) • [📁 Git LFS](https://git-lfs.com/) • [🌐 Pandoc](https://github.com/jgm/pandoc/releases) • [🛠️ Compiler](https://visualstudio.microsoft.com/) |
|---|

The above link downloads Visual Studio as an example. Make sure to install the required SDKs, however.

>
> EXAMPLE error when no compiler installed:
>
>
>
>
> EXAMPLE of installing the correct SDKs:
>
>

[Back to Top](#top)

Installation

### Step 1
Download the ZIP file for the latest "release." Extract its contents and navigate to the `src` folder.
> [!CAUTION]
> If you simply clone this repository you will get the development version, which might not be stable.
### Step 2
Within the `src` folder, create a [virtual environment](https://realpython.com/python-virtual-environments-a-primer/):
```
python -m venv .
```
### Step 3
Activate the virtual environment:
```
.\Scripts\activate
```
### Step 4
Run the setup script:
> Only ```Windows``` is supported for now.

```
python setup_windows.py
```

[Back to Top](#top)

🖥️Usage🖥️

> [!IMPORTANT]
> Instructions on how to use the program are being consolidated into the `Ask Jeeves` functionality, which can be accessed from the "Ask Jeeves" menu option. Please create an issue if Jeeves is not working.

### Start the Program
```
.\Scripts\activate
```
```
python gui.py
```

### 🏗️ Create a Vector Database Download an embedding model from the ```Models Tab```.
1. Set the `chunk size` and `chunk overlap` settings within the `Settings Tab`.
2. Within the `Create Database Tab`, select the files that you want in the vector database.
> 🖼️ images can be selected by clicking the `Choose Files` button.\
> 🎵 Audio files must be transcribed first within the `Tools Tab`.
3. Select the embedding model you want to use.
4. Click `Create Vector Database`.

### 🔍 Query a Vector Database
* Select the database you want to search within the `Query Database Tab`.
* Select `Local Models`, `Kobold`, `LM Studio` or `ChatGPT` for the backend that you want to provide a response to your question.
* Click `Submit Question`.
> The `chunks only` checkbox will display the results from the vector database without getting a response.

### ❓ Which Backend Should I Use?
If you use either the `Kobold` or `LM Studio` you must be familiar with those programs. For example, `LM Studio` must be running in "server mode" and handles the prompt formatting. However,`Kobold` automatically starts in server mode but requires you to specify the prompt formatting.
> [!TIP]
> Kobold [home page](https://github.com/LostRuins/koboldcpp), [instructions](https://github.com/LostRuins/koboldcpp/wiki), and [Discord server](https://koboldai.org/discord)\
> LM Studio [home page](https://lmstudio.ai/), [instructions](https://lmstudio.ai/docs), and [Discord server](https://discord.gg/aPQfnNkxGC).

### 🗑️ Deleting a Database
* In the `Manage Databases Tab`, select a database and click `Delete Database`.

[Back to Top](#top)

## Request a Feature or Report a Bug

Feel free to report bugs or request enhancements by creating an issue on github and I will respond promptly.

CONTACT

I welcome all suggestions - both positive and negative. You can e-mail me directly at "[email protected]" or I can frequently be seen on the ```KoboldAI``` Discord server (moniker is ```vic49```). I am always happy to answer any quesitons or discuss anything vector database related! (no formal affiliation with ```KoboldAI```).