An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with cross-modal-retrieval

A curated list of projects in awesome lists tagged with cross-modal-retrieval .

https://github.com/yehli/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering

Last synced: 12 Apr 2025

https://github.com/YehLi/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering

Last synced: 02 Apr 2025

https://github.com/layumi/Image-Text-Embedding

TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss :feet: https://arxiv.org/abs/1711.05535

bidirectional-retrieval cross-modal-retrieval cross-modality image-retrieval image-search language-retrieval matconvnet matlab person-reidentification visual-semantic

Last synced: 16 Apr 2025

https://github.com/layumi/image-text-embedding

TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss :feet: https://arxiv.org/abs/1711.05535

bidirectional-retrieval cross-modal-retrieval cross-modality image-retrieval image-search language-retrieval matconvnet matlab person-reidentification visual-semantic

Last synced: 12 Apr 2025

https://github.com/jpthu17/emcl

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

cross-modal-retrieval neurips video-captioning video-question-answering video-retrieval

Last synced: 30 Apr 2025

https://github.com/jpthu17/diffusionret

[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

cross-modal-retrieval diffusion-models iccv2023 video-retrieval

Last synced: 30 Apr 2025

https://github.com/yalesong/pvse

Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)

cross-modal-retrieval metric-learning mrw-dataset mscoco-dataset tgif-dataset

Last synced: 15 Dec 2024

https://github.com/jpthu17/hbi

[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

cross-modal-retrieval cvpr video-question-answering video-retrieval

Last synced: 06 Apr 2025

https://github.com/jpthu17/dicosa

[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

cross-modal-retrieval ijcai video-retrieval

Last synced: 30 Apr 2025

https://github.com/peri044/stt

A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.

common-vector-space cross-modal-retrieval deep-learning image-captioning sentence-paraphrasing sequence-to-sequence

Last synced: 26 Dec 2024

https://github.com/haomo-ai/ModaLink

[IROS 2024] This repository contains the implementation of our paper: ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

camera-to-lidar cross-modal-localization cross-modal-retrieval place-recognition

Last synced: 29 Jan 2025

https://github.com/alipay/pc2-noiseofweb

Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.

acmmm acmmm2024 benchmark captioning-images cross-modal-retrieval dataset image-text-matching image-text-retrieval multimodal-learning noisy-correspondence

Last synced: 25 Apr 2025

https://github.com/buaadreamer/spn4cir

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

acmmm2024 blip blip2 clip composed-image-retrieval cross-modal-retrieval data-generation image-retrieval llama llava memory-bank multi-modal-retrieval multimodal-learning transformer

Last synced: 05 Jan 2025

https://github.com/prithivirajdamodaran/whatthefood

An intentionally simple Image to Food cross-modal search. Created by Prithiviraj Damodaran.

cross-modal cross-modal-learning cross-modal-retrieval multimodal

Last synced: 21 Mar 2025

https://github.com/yur1g4/as

The "as" keyword in programming languages is commonly used for type conversion and type assertion operations. It allows developers to explicitly convert one data type to another or assert that an interface value holds a specific underlying data type.

asp aspnetcore automotive cross-modal-retrieval csharp deep-learning graphics help-wanted linux onnx python sentence2vec stylesheets vcv-rack-plugins

Last synced: 23 Feb 2025