Projects in Awesome Lists tagged with cross-modal-retrieval
A curated list of projects in awesome lists tagged with cross-modal-retrieval .
https://github.com/jina-ai/clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
bert bert-as-service clip-as-service clip-model cross-modal-retrieval cross-modality deep-learning image2vec multi-modality neural-search onnx openai pytorch sentence-encoding sentence2vec
Last synced: 22 Apr 2025
https://github.com/hanxiao/bert-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
bert bert-as-service clip-as-service clip-model cross-modal-retrieval cross-modality deep-learning image2vec multi-modality neural-search onnx openai pytorch sentence-encoding sentence2vec
Last synced: 31 Mar 2025
https://github.com/yehli/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering
Last synced: 12 Apr 2025
https://github.com/YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering
Last synced: 02 Apr 2025
https://github.com/layumi/Image-Text-Embedding
TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss :feet: https://arxiv.org/abs/1711.05535
bidirectional-retrieval cross-modal-retrieval cross-modality image-retrieval image-search language-retrieval matconvnet matlab person-reidentification visual-semantic
Last synced: 16 Apr 2025
https://github.com/layumi/image-text-embedding
TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss :feet: https://arxiv.org/abs/1711.05535
bidirectional-retrieval cross-modal-retrieval cross-modality image-retrieval image-search language-retrieval matconvnet matlab person-reidentification visual-semantic
Last synced: 12 Apr 2025
https://github.com/jpthu17/emcl
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
cross-modal-retrieval neurips video-captioning video-question-answering video-retrieval
Last synced: 30 Apr 2025
https://github.com/jpthu17/diffusionret
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
cross-modal-retrieval diffusion-models iccv2023 video-retrieval
Last synced: 30 Apr 2025
https://github.com/yalesong/pvse
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
cross-modal-retrieval metric-learning mrw-dataset mscoco-dataset tgif-dataset
Last synced: 15 Dec 2024
https://github.com/jpthu17/hbi
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
cross-modal-retrieval cvpr video-question-answering video-retrieval
Last synced: 06 Apr 2025
https://github.com/jpthu17/dicosa
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
cross-modal-retrieval ijcai video-retrieval
Last synced: 30 Apr 2025
https://github.com/peri044/stt
A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.
common-vector-space cross-modal-retrieval deep-learning image-captioning sentence-paraphrasing sequence-to-sequence
Last synced: 26 Dec 2024
https://github.com/haomo-ai/ModaLink
[IROS 2024] This repository contains the implementation of our paper: ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
camera-to-lidar cross-modal-localization cross-modal-retrieval place-recognition
Last synced: 29 Jan 2025
https://github.com/alipay/pc2-noiseofweb
Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.
acmmm acmmm2024 benchmark captioning-images cross-modal-retrieval dataset image-text-matching image-text-retrieval multimodal-learning noisy-correspondence
Last synced: 25 Apr 2025
https://github.com/buaadreamer/spn4cir
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
acmmm2024 blip blip2 clip composed-image-retrieval cross-modal-retrieval data-generation image-retrieval llama llava memory-bank multi-modal-retrieval multimodal-learning transformer
Last synced: 05 Jan 2025
https://github.com/buaadreamer/ccrk
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
cross-lingual cross-lingual-retrieval cross-modal cross-modal-retrieval iglue image-text-retrieval image-text-search kdd2024 mscoco multi30k retrieval swin-transformer vision-language-pretraining wit xflickrco xlm-roberta
Last synced: 11 Apr 2025
https://github.com/prithivirajdamodaran/whatthefood
An intentionally simple Image to Food cross-modal search. Created by Prithiviraj Damodaran.
cross-modal cross-modal-learning cross-modal-retrieval multimodal
Last synced: 21 Mar 2025
https://github.com/yur1g4/as
The "as" keyword in programming languages is commonly used for type conversion and type assertion operations. It allows developers to explicitly convert one data type to another or assert that an interface value holds a specific underlying data type.
asp aspnetcore automotive cross-modal-retrieval csharp deep-learning graphics help-wanted linux onnx python sentence2vec stylesheets vcv-rack-plugins
Last synced: 23 Feb 2025