https://github.com/concaption/text2img-search

text to image search ai app using haystack and clip
https://github.com/concaption/text2img-search

Last synced: 9 months ago
JSON representation

text to image search ai app using haystack and clip

Host: GitHub
URL: https://github.com/concaption/text2img-search
Owner: concaption
Created: 2023-09-30T18:31:44.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-10-03T23:39:21.000Z (over 2 years ago)
Last Synced: 2025-04-28T11:17:12.635Z (12 months ago)
Language: Python
Size: 1.96 MB
Stars: 6
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Multimodal Search with Haystack: A Step-By-Step Guide

![Screenshot](assets/screenshot.png)

![API Screenshot](assets/api.png)

## Setup

For streamlit app

```

make setup

make run

```

For FASTapi

```

make run-api

```

For dockerized API

```

make docker-api

```

## Introduction

In today's world, data is not limited to text. We have a plethora of multimedia content such as images, audio, and video. Therefore, having a search mechanism that can look into multiple types of media is more useful than ever. In this tutorial, we will focus on creating a multimodal search capability using Haystack's `MultiModalRetriever`. We will be using Python for this tutorial.

## Technology Stack

- Python: The programming language used for this project.

- Haystack: An open-source framework for building search systems.

- Sentence Transformers: For using the CLIP-ViT-B-32 model to get embeddings.

## Step 1: Setup

Firstly, you'll need to install the Haystack library if you haven't already:

```bash

pip install farm-haystack

```

## Step 2: Import Necessary Modules

```python

import os

from haystack import Document

from haystack import Pipeline

from haystack.document_stores import InMemoryDocumentStore

from haystack.nodes.retriever.multimodal import MultiModalRetriever

```

## Step 3: Create the MultiModalSearch Class

Here is the complete code with detailed comments.

```python

class MultiModalSearch:

    def __init__(self):

        self.document_store = InMemoryDocumentStore(embedding_dim = 512)

        doc_dir = "./data"

        images = [

            Document(content=f"./{doc_dir}/{filename}", content_type="image", meta={"name": filename})

            for filename in os.listdir(doc_dir)

            if filename.endswith(".jpg")

        ]

        self.document_store.write_documents(images)

        self.retriever = MultiModalRetriever(

            query_embedding_model="sentence-transformers/clip-ViT-B-32",

            query_type= "text",

            document_embedding_models={

                "image": "sentence-transformers/clip-ViT-B-32",

            },

            document_store=self.document_store,

        )

        self.document_store.update_embeddings(self.retriever)

        self.pipeline = Pipeline()

        self.pipeline.add_node(component=self.retriever, name="Retriever", inputs=["Query"])

    def search(self, query):

        prediction = self.pipeline.run(query=query, params={"Retriever": {"top_k": 3}})

        return sorted(prediction["documents"], key=lambda x: x.score, reverse=True)

```

## Step 4: Use Cases

1. **E-commerce Platforms**: When users want to find products similar to a reference image or description.

2. **Media Libraries**: To search for images or videos based on textual queries or vice versa.

3. **Research**: For tasks like object identification in images based on textual descriptions.

## Conclusion

Multimodal search is becoming increasingly important as we deal with varied types of data. With frameworks like Haystack, building such capabilities has become more straightforward than ever.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/concaption/text2img-search

Awesome Lists containing this project

README