{"id":23057469,"url":"https://github.com/concaption/text2img-search","last_synced_at":"2025-08-07T01:13:02.192Z","repository":{"id":197487084,"uuid":"698715074","full_name":"concaption/text2img-search","owner":"concaption","description":"text to image search ai app using haystack and clip","archived":false,"fork":false,"pushed_at":"2023-10-03T23:39:21.000Z","size":2058,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-28T11:17:12.635Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/concaption.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-09-30T18:31:44.000Z","updated_at":"2025-04-08T02:24:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"c2bfb047-0718-415f-931d-6a5b496f22db","html_url":"https://github.com/concaption/text2img-search","commit_stats":null,"previous_names":["concaption/text2img-search"],"tags_count":1,"template":false,"template_full_name":"concaption/python-template","purl":"pkg:github/concaption/text2img-search","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concaption%2Ftext2img-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concaption%2Ftext2img-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concaption%2Ftext2img-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concaption%2Ftext2img-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/concaption","download_url":"https://codeload.github.com/concaption/text2img-search/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concaption%2Ftext2img-search/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269182211,"owners_count":24374079,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-06T02:00:09.910Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-16T02:12:24.374Z","updated_at":"2025-08-07T01:13:02.124Z","avatar_url":"https://github.com/concaption.png","language":"Python","readme":"# Multimodal Search with Haystack: A Step-By-Step Guide\n![Screenshot](assets/screenshot.png)\n![API Screenshot](assets/api.png)\n\n## Setup\n\nFor streamlit app\n\n```\nmake setup\nmake run\n```\nFor FASTapi\n```\nmake run-api\n```\n\nFor dockerized API\n```\nmake docker-api\n```\n\n## Introduction\n\nIn today's world, data is not limited to text. We have a plethora of multimedia content such as images, audio, and video. Therefore, having a search mechanism that can look into multiple types of media is more useful than ever. In this tutorial, we will focus on creating a multimodal search capability using Haystack's `MultiModalRetriever`. We will be using Python for this tutorial.\n\n## Technology Stack\n\n- Python: The programming language used for this project.\n- Haystack: An open-source framework for building search systems.\n- Sentence Transformers: For using the CLIP-ViT-B-32 model to get embeddings.\n\n## Step 1: Setup\n\nFirstly, you'll need to install the Haystack library if you haven't already:\n\n```bash\npip install farm-haystack\n```\n\n## Step 2: Import Necessary Modules\n\n```python\nimport os\nfrom haystack import Document\nfrom haystack import Pipeline\nfrom haystack.document_stores import InMemoryDocumentStore\nfrom haystack.nodes.retriever.multimodal import MultiModalRetriever\n```\n\n## Step 3: Create the MultiModalSearch Class\n\nHere is the complete code with detailed comments.\n\n```python\nclass MultiModalSearch:\n    def __init__(self):\n        self.document_store = InMemoryDocumentStore(embedding_dim = 512)\n        doc_dir = \"./data\"\n        images = [\n            Document(content=f\"./{doc_dir}/{filename}\", content_type=\"image\", meta={\"name\": filename})\n            for filename in os.listdir(doc_dir)\n            if filename.endswith(\".jpg\")\n        ]\n        self.document_store.write_documents(images)\n        self.retriever = MultiModalRetriever(\n            query_embedding_model=\"sentence-transformers/clip-ViT-B-32\",\n            query_type= \"text\",\n            document_embedding_models={\n                \"image\": \"sentence-transformers/clip-ViT-B-32\",\n            },\n            document_store=self.document_store,\n        )\n        self.document_store.update_embeddings(self.retriever)\n        self.pipeline = Pipeline()\n        self.pipeline.add_node(component=self.retriever, name=\"Retriever\", inputs=[\"Query\"])\n    def search(self, query):\n        prediction = self.pipeline.run(query=query, params={\"Retriever\": {\"top_k\": 3}})\n        return sorted(prediction[\"documents\"], key=lambda x: x.score, reverse=True)\n```\n\n## Step 4: Use Cases\n\n1. **E-commerce Platforms**: When users want to find products similar to a reference image or description.\n2. **Media Libraries**: To search for images or videos based on textual queries or vice versa.\n3. **Research**: For tasks like object identification in images based on textual descriptions.\n\n## Conclusion\n\nMultimodal search is becoming increasingly important as we deal with varied types of data. With frameworks like Haystack, building such capabilities has become more straightforward than ever.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconcaption%2Ftext2img-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconcaption%2Ftext2img-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconcaption%2Ftext2img-search/lists"}