Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/comhendrik/vectormatch
Text Embedding and Search with PostgreSQL and Hugging Face in Docker. This project demonstrates a Python script that embeds text using a model from Hugging Face, stores the embeddings in PostgreSQL with the pgvector extension, and allows searching the database using regular text queries by comparing embeddings.
https://github.com/comhendrik/vectormatch
embeddings-similarity nlp pgvecto-rs postgresql python vector vector-database
Last synced: 4 days ago
JSON representation
Text Embedding and Search with PostgreSQL and Hugging Face in Docker. This project demonstrates a Python script that embeds text using a model from Hugging Face, stores the embeddings in PostgreSQL with the pgvector extension, and allows searching the database using regular text queries by comparing embeddings.
- Host: GitHub
- URL: https://github.com/comhendrik/vectormatch
- Owner: comhendrik
- Created: 2024-09-12T11:40:29.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-28T21:03:51.000Z (26 days ago)
- Last Synced: 2024-11-28T21:26:44.903Z (26 days ago)
- Topics: embeddings-similarity, nlp, pgvecto-rs, postgresql, python, vector, vector-database
- Language: Python
- Homepage:
- Size: 6.84 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Text Embedding and Search with PostgreSQL and Hugging Face in Docker
This project demonstrates a Python script that embeds text using a model from Hugging Face, stores the embeddings in PostgreSQL with the `pgvector` extension, and allows searching the database using regular text queries by comparing embeddings. After the data is retrieved an llm is used to generate a response with ollama. The Project is run with Docker Compose
## Features
- **Embeddings:** Use Hugging Face's transformers to embed input text.
- **PostgreSQL with pgvector:** Store embeddings in a PostgreSQL database using the `pgvector` extension to perform vector-based searches.
- **Search Functionality:** Retrieve database entries by comparing the input text's embedding to the stored embeddings.
- **Docker Support:** Run the whole application with Docker compose
- **Ollama:** Generate response based on local llm## Prerequisites
Make sure you have the following installed:
- **Docker**### Setup
Get the project directory
```
git clone https://github.com/comhendrik/vectorMatch.git
```
Start docker and go into the project directory and run the compose file
```
docker compose up
```
Wait for the script to be done, this can take a few minutes and then attach yourself to the vectorMatch container
```
docker attach vectormatch-vector-match-1
```