Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/itspreto/vectr8
Embed anything.
https://github.com/itspreto/vectr8
embeddings llms vector-database vector-database-embedding vector-similarity-search
Last synced: about 2 months ago
JSON representation
Embed anything.
- Host: GitHub
- URL: https://github.com/itspreto/vectr8
- Owner: itsPreto
- Created: 2024-05-20T07:16:35.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-05-24T00:18:31.000Z (7 months ago)
- Last Synced: 2024-10-11T18:56:25.997Z (2 months ago)
- Topics: embeddings, llms, vector-database, vector-database-embedding, vector-similarity-search
- Language: JavaScript
- Homepage: https://github.com/itsPreto/VECTR8/blob/main/README.md
- Size: 16.2 MB
- Stars: 29
- Watchers: 2
- Forks: 4
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
VECT.R8 (Vector Embeddings Creation, Transformation & Retrieval) 🚀
A Web UI where you can upload CSV/JSON files, create vector embeddings, and query them. Soon, you'll be able to convert unstructured data to JSON/CSV using an integrated LLM.
Project under heavy/active development [may be] unstable. Embeddings and Query pages WIP ⚠️
Table of Contents
Section
Links
Prerequisites
Installation
Running the Application
Uploading Files 📂
Previewing Data 🧐
Creating Vector Embeddings 🧩
Querying the Vector Database 🔍
Managing the Vector Database 🛠️
UI Walkthrough 🎨
- Uploading Files
- Previewing Data
- Creating Vector Embeddings
- Querying the Vector Database
- Managing the Vector Database
-----
Prerequisites
Requirement
Description
Python 3.7+
The application requires Python 3.7 or higher to leverage modern libraries and syntax.
Flask
Essential for running embedding models. Utilized by transformers.
Flask-CORS
Enables CORS for frontend-backend communication.
transformers
Used for creating vector embeddings with pre-trained models.
torch
Essential for running embedding models. Utilized by transformers.
numpy
Handles arrays and mathematical operations. Used throughout the application.
pandas
Processes CSV and JSON files. Utilized throughout the application.
Installation
Step
Instructions
Clone the repository
git clone https://github.com/itsPreto/VECTR8.git
cd VECTR8
Install the required packages
pip install -r requirements.txt
Running the Application
Step
Instructions
Start the Flask server
python3 rag.py
Automatically launch React frontend
The Python endpoint will launch the React frontend in a separate subprocess.
Open your web browser
Navigate tohttp://127.0.0.1:4000
Uploading Files 📂
Step
Instructions
Drag and Drop a File
Drag and drop a CSV or JSON file into the upload area or click to select a file from your computer.
View Uploaded File Information
Once uploaded, the file information such as name and size will be displayed.
Command Line
To upload a file using
curl
:
curl -X POST -F 'file=@/path/to/your/file.csv' http://127.0.0.1:4000/upload_file
Previewing Data 🧐
Step
Instructions
Select Embedding Keys
After uploading a file, select the keys (columns) you want to include in the embeddings.
Preview Document
View a preview of the document created from the selected keys.
Preview Embeddings
View the generated embeddings and token count for the selected document.
Command Line
To preview a file's keys using
curl
:
curl -X POST -H "Content-Type: application/json" -d '{"file_path":"uploads/your-file.csv"}' http://127.0.0.1:4000/preview_file
To preview a document's embeddings using
curl
:
curl -X POST -H "Content-Type: application/json" -d '{"file_path":"uploads/your-file.csv", "selected_keys":["key1", "key2"]}' http://127.0.0.1:4000/preview_document
Creating Vector Embeddings 🧩
Step
Instructions
Start Embedding Creation
Click the "Create Vector DB" button to start the embedding creation process.
View Progress
Monitor the progress of the embedding creation with a circular progress indicator. 📈
Command Line
To create a vector database using
curl
:
curl -X POST -H "Content-Type: application/json" -d '{"file_path":"uploads/your-file.csv", "selected_keys":["key1", "key2"]}' http://127.0.0.1:4000/create_vector_database
Querying the Vector Database 🔍
Step
Instructions
Enter Query
Type your query into the input field.
Select Similarity Metric
Choose between cosine similarity or Euclidean distance.
Submit Query
Click the "Submit" button to query the vector database.
View Results
Inspect the results, which display the document, score, and a button to view detailed data.
Command Line
To query the vector database using
curl
:
curl -X POST -H "Content-Type: application/json" -d '{"query_text":"Your query text here", "similarity_metric":"cosine"}' http://127.0.0.1:4000/query
Managing the Vector Database 🛠️
Step
Instructions
Backup Database
Click the "Backup Database" button to create a backup of the current vector database.
Delete Database
Click the "Delete Database" button to delete the current vector database.
View Database Statistics
View statistics such as total documents and average vector length.
Command Line
To check if the vector database exists using
curl
:
curl -X GET http://127.0.0.1:4000/check_vector_db
To view database statistics using
curl
:
curl -X GET http://127.0.0.1:4000/db_stats
To backup the database using
curl
:
curl -X POST http://127.0.0.1:4000/backup_db
To delete the database using
curl
:
curl -X POST http://127.0.0.1:4000/delete_db
UI Walkthrough 🎨
Feature
Description
Uploading Files
- Drag and drop a file into the upload area or click to select a file.
- File information will be displayed after a successful upload.
Previewing Data
- Select the keys you want to include in the embeddings.
- View a preview of the document and generated embeddings.
Creating Vector Embeddings
- Click the "Create Vector DB" button to start the embedding creation.
- Monitor the progress with the circular progress indicator.
Querying the Vector Database
- Enter your query text and select a similarity metric.
- Click "Submit" to query the database and view the results.
Managing the Vector Database
- Backup the database by clicking "Backup Database".
- Delete the database by clicking "Delete Database".
- View database statistics such as total documents and average vector length.