https://github.com/concaption/text2img-search
text to image search ai app using haystack and clip
https://github.com/concaption/text2img-search
Last synced: 9 months ago
JSON representation
text to image search ai app using haystack and clip
- Host: GitHub
- URL: https://github.com/concaption/text2img-search
- Owner: concaption
- Created: 2023-09-30T18:31:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-03T23:39:21.000Z (over 2 years ago)
- Last Synced: 2025-04-28T11:17:12.635Z (12 months ago)
- Language: Python
- Size: 1.96 MB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Multimodal Search with Haystack: A Step-By-Step Guide


## Setup
For streamlit app
```
make setup
make run
```
For FASTapi
```
make run-api
```
For dockerized API
```
make docker-api
```
## Introduction
In today's world, data is not limited to text. We have a plethora of multimedia content such as images, audio, and video. Therefore, having a search mechanism that can look into multiple types of media is more useful than ever. In this tutorial, we will focus on creating a multimodal search capability using Haystack's `MultiModalRetriever`. We will be using Python for this tutorial.
## Technology Stack
- Python: The programming language used for this project.
- Haystack: An open-source framework for building search systems.
- Sentence Transformers: For using the CLIP-ViT-B-32 model to get embeddings.
## Step 1: Setup
Firstly, you'll need to install the Haystack library if you haven't already:
```bash
pip install farm-haystack
```
## Step 2: Import Necessary Modules
```python
import os
from haystack import Document
from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes.retriever.multimodal import MultiModalRetriever
```
## Step 3: Create the MultiModalSearch Class
Here is the complete code with detailed comments.
```python
class MultiModalSearch:
def __init__(self):
self.document_store = InMemoryDocumentStore(embedding_dim = 512)
doc_dir = "./data"
images = [
Document(content=f"./{doc_dir}/{filename}", content_type="image", meta={"name": filename})
for filename in os.listdir(doc_dir)
if filename.endswith(".jpg")
]
self.document_store.write_documents(images)
self.retriever = MultiModalRetriever(
query_embedding_model="sentence-transformers/clip-ViT-B-32",
query_type= "text",
document_embedding_models={
"image": "sentence-transformers/clip-ViT-B-32",
},
document_store=self.document_store,
)
self.document_store.update_embeddings(self.retriever)
self.pipeline = Pipeline()
self.pipeline.add_node(component=self.retriever, name="Retriever", inputs=["Query"])
def search(self, query):
prediction = self.pipeline.run(query=query, params={"Retriever": {"top_k": 3}})
return sorted(prediction["documents"], key=lambda x: x.score, reverse=True)
```
## Step 4: Use Cases
1. **E-commerce Platforms**: When users want to find products similar to a reference image or description.
2. **Media Libraries**: To search for images or videos based on textual queries or vice versa.
3. **Research**: For tasks like object identification in images based on textual descriptions.
## Conclusion
Multimodal search is becoming increasingly important as we deal with varied types of data. With frameworks like Haystack, building such capabilities has become more straightforward than ever.