Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with multi-modal-learning
A curated list of projects in awesome lists tagged with multi-modal-learning .
https://github.com/mlfoundations/open_clip
An open source implementation of CLIP.
computer-vision contrastive-loss deep-learning language-model multi-modal-learning pretrained-models pytorch zero-shot-classification
Last synced: 16 Dec 2024
https://github.com/ofa-sys/chinese-clip
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language
Last synced: 16 Dec 2024
https://github.com/OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language
Last synced: 03 Nov 2024
https://github.com/lyuchenyang/macaw-llm
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks
Last synced: 19 Dec 2024
https://github.com/lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks
Last synced: 07 Nov 2024
https://github.com/QIN2DIM/hcaptcha-challenger
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
clip computer-vision hcaptcha hcaptcha-solver image-segmentation multi-modal multi-modal-learning object-detection onnx onnx-models onnxruntime opencv-python playwright solver yolo yolov5 zero-shot-classification
Last synced: 31 Oct 2024
https://github.com/qin2dim/hcaptcha-challenger
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
clip computer-vision hcaptcha hcaptcha-solver image-segmentation multi-modal multi-modal-learning object-detection onnx onnx-models onnxruntime opencv-python playwright solver yolo yolov5 zero-shot-classification
Last synced: 19 Dec 2024
https://github.com/nvlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
image-captioning language-model multi-modal-learning multi-task-learning vision-and-language vision-language-model vqa
Last synced: 15 Dec 2024
https://github.com/lucidrains/x-clip
A concise but complete implementation of CLIP with various experimental improvements from recent papers
artificial-intelligence contrastive-learning deep-learning multi-modal-learning zero-shot-learning
Last synced: 21 Dec 2024
https://github.com/kyegomez/zeta
Build high-performance AI models with modular building blocks
artificial-intelligence deep-learning gpt4 llama2 longnet multi-agent-systems multi-modal multi-modal-learning multi-platform pytorch speech-recognition transformer transformers
Last synced: 19 Dec 2024
https://github.com/dmitryryumin/cvpr-2023-24-papers
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
action-recognition autonomous-driving biometrics computer-vision cvpr cvpr2023 cvpr2024 datasets deep-learning face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition scene-analysis segmentation self-supervised-learning shape-analysis video-synthesis
Last synced: 15 Dec 2024
https://github.com/qizekun/ReCon
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
3d-point-clouds multi-modal-learning representation-learning self-supervised-learning
Last synced: 28 Oct 2024
https://github.com/rentainhe/trar-vqa
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2
Last synced: 07 Nov 2024
https://github.com/ttgeng233/UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
audio-visual-events audio-visual-learning multi-modal-learning
Last synced: 16 Nov 2024
https://github.com/kyegomez/neva
The open source implementation of "NeVA: NeMo Vision and Language Assistant"
artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics
Last synced: 09 Nov 2024
https://github.com/kyegomez/megavit
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
artificial-intelligence computer-vision gpt4 multi-modal multi-modal-fusion multi-modal-learning vision-and-language vision-transformer
Last synced: 09 Nov 2024
https://github.com/sayakpaul/multimodal-entailment-baseline
This repository shows how to implement a basic model for multimodal entailment.
entailment keras multi-modal-learning tensorflow
Last synced: 23 Oct 2024
https://github.com/agora-lab-ai/ekr
Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.
artificial-intelligence chroma embeddings multi-modal-learning multimodal pinecone vectordatabase
Last synced: 10 Nov 2024
https://github.com/chenxi52/FrozenSeg
Open-Vocabulary Panoptic Segmentation
clip instance-segmentation multi-modal-learning open-vocabulary open-vocabulary-segmentation open-vocabulary-semantic-segmentation panoptic-segmentation segment-anything segmentation vision-and-language zero-shot
Last synced: 30 Nov 2024
https://github.com/jianzhnie/multimodaltransformers
lmmtoolkit is a toolkit for Multi-Modal Learning
image-text multi-modal-learning text-image text-to-video
Last synced: 06 Dec 2024
https://github.com/lyuchenyang/semantic-aware-videoqa
Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"
artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering
Last synced: 21 Nov 2024
https://github.com/lyuchenyang/efficient-videoqa
Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"
artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering
Last synced: 21 Nov 2024
https://github.com/liu42/contrastive
项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型
bert cnn computer-vision contrastive-learning deep-learning image-text-retrieval image-text-search multi-modal multi-modal-learning nlp pytorch roberta transformers
Last synced: 13 Dec 2024
https://github.com/ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification
This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification".
deep-learning fusion-techniques metadata metadata-fusion multi-modal multi-modal-learning wild-animal-classification wild-life-monitoring
Last synced: 17 Nov 2024
https://github.com/stifler7/multi-modal-learning-for-image-and-text-analysis
Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.
machine-learning multi-modal-learning
Last synced: 01 Dec 2024
https://github.com/amazon-science/contrastive_emc2
Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"
contrastive-learning deep-neural-networks machine-learning machine-learning-algorithms mcmc-sampling multi-modal multi-modal-learning
Last synced: 12 Nov 2024
https://github.com/loharmurtaza/fog_detection
This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"
accelerometer detection eeg emg f1-score freezing-of-gait gyroscope machine-learning mfcc multi-modal-learning rf sensitivity skin-conductance specificity svm
Last synced: 26 Sep 2024
https://github.com/loharmurtaza/fog_detection_subject_dependent
This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"
accelerometer detection eeg emg f1-score freezing-of-gait gyroscope machine-learning mfcc multi-modal-learning rf sensitivity skin-conductance specificity svm
Last synced: 20 Dec 2024