An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with multi-modal-learning

A curated list of projects in awesome lists tagged with multi-modal-learning .

https://github.com/lyuchenyang/macaw-llm

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks

Last synced: 14 May 2025

https://github.com/lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks

Last synced: 11 Apr 2025

https://github.com/nvlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-and-language vision-language-model vqa

Last synced: 16 May 2025

https://github.com/lucidrains/x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

artificial-intelligence contrastive-learning deep-learning multi-modal-learning zero-shot-learning

Last synced: 01 Apr 2025

https://github.com/dmitryryumin/cvpr-2023-24-papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

action-recognition autonomous-driving biometrics computer-vision cvpr cvpr2023 cvpr2024 datasets deep-learning face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition scene-analysis segmentation self-supervised-learning shape-analysis video-synthesis

Last synced: 05 Apr 2025

https://github.com/huggingface/chug

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

computer-vision dataloading datasets distributed-training document-understanding multi-modal-learning pdf-document webdataset

Last synced: 14 Oct 2025

https://github.com/qizekun/ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3d-point-clouds multi-modal-learning representation-learning self-supervised-learning

Last synced: 20 Mar 2025

https://github.com/rentainhe/trar-vqa

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2

Last synced: 28 Aug 2025

https://github.com/ttgeng233/UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

audio-visual-events audio-visual-learning multi-modal-learning

Last synced: 09 May 2025

https://github.com/ttgeng233/unav

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

audio-visual-events audio-visual-learning multi-modal-learning

Last synced: 26 Sep 2025

https://github.com/3dlg-hcvc/duoduoclip

[ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

3d-classification 3d-shape-retrieval 3d-understanding clip multi-modal-learning pytorch

Last synced: 05 Apr 2025

https://github.com/kyegomez/megavit

The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"

artificial-intelligence computer-vision gpt4 multi-modal multi-modal-fusion multi-modal-learning vision-and-language vision-transformer

Last synced: 19 Jul 2025

https://github.com/filipbasara0/simple-clip

A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch

contrastive-learning deep-learning machine-learning multi-modal-learning pytorch representation-learning self-supervised-learning siglip zero-shot-classification

Last synced: 11 Apr 2025

https://github.com/depshad/deep-learning-framework-for-multi-modal-product-classification

Code repository for Rakuten Data Challenge: Multimodal Product Classification and Retrieval.

computer-vision deep-learning multi-modal-learning nlp pytorch rakuten-data-challenge

Last synced: 04 Sep 2025

https://github.com/kyegomez/neva

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics

Last synced: 15 Oct 2025

https://github.com/agora-lab-ai/ekr

Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.

artificial-intelligence chroma embeddings multi-modal-learning multimodal pinecone vectordatabase

Last synced: 10 Aug 2025

https://github.com/sayakpaul/multimodal-entailment-baseline

This repository shows how to implement a basic model for multimodal entailment.

entailment keras multi-modal-learning tensorflow

Last synced: 07 May 2025

https://github.com/fullscreen-triangle/four-sided-triangle

A sophisticated multi-model optimization pipeline for domain-expert knowledge extraction RAG systems

ai claude-3-5-sonnet domain-experts multi-modal-learning openai optimization-algorithms prompt-engineering prompt-tuning rag retrieval-augmented-generation

Last synced: 22 Jun 2025

https://github.com/lyuchenyang/semantic-aware-videoqa

Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"

artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering

Last synced: 15 Mar 2025

https://github.com/ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification

This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification": https://www.sciencedirect.com/science/article/pii/S1574954124003479.

deep-learning fusion-techniques metadata metadata-fusion multi-modal multi-modal-learning wild-animal-classification wild-life-monitoring

Last synced: 25 May 2026

https://github.com/amazon-science/contrastive_emc2

Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"

contrastive-learning deep-neural-networks machine-learning machine-learning-algorithms mcmc-sampling multi-modal multi-modal-learning

Last synced: 04 Oct 2025

https://github.com/lyuchenyang/efficient-videoqa

Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"

artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering

Last synced: 15 Mar 2025

https://github.com/jianzhnie/multimodaltransformers

lmmtoolkit is a toolkit for Multi-Modal Learning

image-text multi-modal-learning text-image text-to-video

Last synced: 15 Sep 2025

https://github.com/stifler7/multi-modal-learning-for-image-and-text-analysis

Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.

machine-learning multi-modal-learning

Last synced: 24 Mar 2025

https://github.com/loharmurtaza/fog_detection_subject_dependent

This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"

accelerometer detection eeg emg f1-score freezing-of-gait gyroscope machine-learning mfcc multi-modal-learning rf sensitivity skin-conductance specificity svm

Last synced: 20 Jan 2026

https://github.com/loharmurtaza/FoG_detection_subject_dependent

This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"

accelerometer detection eeg emg f1-score freezing-of-gait gyroscope machine-learning mfcc multi-modal-learning rf sensitivity skin-conductance specificity svm

Last synced: 29 Sep 2025