Projects in Awesome Lists tagged with multi-modal-learning
A curated list of projects in awesome lists tagged with multi-modal-learning .
https://github.com/mlfoundations/open_clip
An open source implementation of CLIP.
computer-vision contrastive-loss deep-learning language-model multi-modal-learning pretrained-models pytorch zero-shot-classification
Last synced: 12 May 2025
https://github.com/ofa-sys/chinese-clip
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language
Last synced: 29 Apr 2025
https://github.com/OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language
Last synced: 02 Apr 2025
https://github.com/lyuchenyang/macaw-llm
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks
Last synced: 14 May 2025
https://github.com/lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks
Last synced: 11 Apr 2025
https://github.com/QIN2DIM/hcaptcha-challenger
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
clip computer-vision hcaptcha hcaptcha-solver image-segmentation multi-modal multi-modal-learning object-detection onnx onnx-models onnxruntime opencv-python playwright solver yolo yolov5 zero-shot-classification
Last synced: 28 Mar 2025
https://github.com/nvlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
image-captioning language-model multi-modal-learning multi-task-learning vision-and-language vision-language-model vqa
Last synced: 16 May 2025
https://github.com/lucidrains/x-clip
A concise but complete implementation of CLIP with various experimental improvements from recent papers
artificial-intelligence contrastive-learning deep-learning multi-modal-learning zero-shot-learning
Last synced: 01 Apr 2025
https://github.com/kyegomez/zeta
Build high-performance AI models with modular building blocks
artificial-intelligence deep-learning gpt4 llama2 longnet multi-agent-systems multi-modal multi-modal-learning multi-platform pytorch speech-recognition transformer transformers
Last synced: 14 May 2025
https://github.com/dmitryryumin/cvpr-2023-24-papers
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
action-recognition autonomous-driving biometrics computer-vision cvpr cvpr2023 cvpr2024 datasets deep-learning face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition scene-analysis segmentation self-supervised-learning shape-analysis video-synthesis
Last synced: 05 Apr 2025
https://github.com/huggingface/chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
computer-vision dataloading datasets distributed-training document-understanding multi-modal-learning pdf-document webdataset
Last synced: 14 Oct 2025
https://github.com/qizekun/ReCon
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
3d-point-clouds multi-modal-learning representation-learning self-supervised-learning
Last synced: 20 Mar 2025
https://github.com/rentainhe/trar-vqa
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2
Last synced: 28 Aug 2025
https://github.com/ttgeng233/UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
audio-visual-events audio-visual-learning multi-modal-learning
Last synced: 09 May 2025
https://github.com/ttgeng233/unav
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
audio-visual-events audio-visual-learning multi-modal-learning
Last synced: 26 Sep 2025
https://github.com/3dlg-hcvc/duoduoclip
[ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
3d-classification 3d-shape-retrieval 3d-understanding clip multi-modal-learning pytorch
Last synced: 05 Apr 2025
https://github.com/kyegomez/megavit
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
artificial-intelligence computer-vision gpt4 multi-modal multi-modal-fusion multi-modal-learning vision-and-language vision-transformer
Last synced: 19 Jul 2025
https://github.com/filipbasara0/simple-clip
A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch
contrastive-learning deep-learning machine-learning multi-modal-learning pytorch representation-learning self-supervised-learning siglip zero-shot-classification
Last synced: 11 Apr 2025
https://github.com/chenxi52/CMPF
Open-Vocabulary Panoptic Segmentation
clip instance-segmentation multi-modal-learning open-vocabulary open-vocabulary-segmentation open-vocabulary-semantic-segmentation panoptic-segmentation segment-anything segmentation vision-and-language zero-shot
Last synced: 24 Jul 2025
https://github.com/depshad/deep-learning-framework-for-multi-modal-product-classification
Code repository for Rakuten Data Challenge: Multimodal Product Classification and Retrieval.
computer-vision deep-learning multi-modal-learning nlp pytorch rakuten-data-challenge
Last synced: 04 Sep 2025
https://github.com/kyegomez/neva
The open source implementation of "NeVA: NeMo Vision and Language Assistant"
artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics
Last synced: 15 Oct 2025
https://github.com/agora-lab-ai/ekr
Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.
artificial-intelligence chroma embeddings multi-modal-learning multimodal pinecone vectordatabase
Last synced: 10 Aug 2025
https://github.com/sayakpaul/multimodal-entailment-baseline
This repository shows how to implement a basic model for multimodal entailment.
entailment keras multi-modal-learning tensorflow
Last synced: 07 May 2025
https://github.com/mailcorahul/auto_labeler
auto_labeler - An all-in-one library to automatically label vision data
deep-learning deep-learning-library image-classification instance-segmentation multi-modal-learning object-detection pseudo-labeling text-to-image
Last synced: 05 Mar 2026
https://github.com/fullscreen-triangle/four-sided-triangle
A sophisticated multi-model optimization pipeline for domain-expert knowledge extraction RAG systems
ai claude-3-5-sonnet domain-experts multi-modal-learning openai optimization-algorithms prompt-engineering prompt-tuning rag retrieval-augmented-generation
Last synced: 22 Jun 2025
https://github.com/lyuchenyang/semantic-aware-videoqa
Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"
artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering
Last synced: 15 Mar 2025
https://github.com/ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification
This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification": https://www.sciencedirect.com/science/article/pii/S1574954124003479.
deep-learning fusion-techniques metadata metadata-fusion multi-modal multi-modal-learning wild-animal-classification wild-life-monitoring
Last synced: 25 May 2026
https://github.com/amazon-science/contrastive_emc2
Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"
contrastive-learning deep-neural-networks machine-learning machine-learning-algorithms mcmc-sampling multi-modal multi-modal-learning
Last synced: 04 Oct 2025
https://github.com/lyuchenyang/efficient-videoqa
Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"
artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering
Last synced: 15 Mar 2025
https://github.com/jianzhnie/multimodaltransformers
lmmtoolkit is a toolkit for Multi-Modal Learning
image-text multi-modal-learning text-image text-to-video
Last synced: 15 Sep 2025
https://github.com/stifler7/multi-modal-learning-for-image-and-text-analysis
Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.
machine-learning multi-modal-learning
Last synced: 24 Mar 2025
https://github.com/loharmurtaza/fog_detection_subject_dependent
This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"
accelerometer detection eeg emg f1-score freezing-of-gait gyroscope machine-learning mfcc multi-modal-learning rf sensitivity skin-conductance specificity svm
Last synced: 20 Jan 2026
https://github.com/loharmurtaza/FoG_detection_subject_dependent
This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"
accelerometer detection eeg emg f1-score freezing-of-gait gyroscope machine-learning mfcc multi-modal-learning rf sensitivity skin-conductance specificity svm
Last synced: 29 Sep 2025