Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with multi-modal-learning

A curated list of projects in awesome lists tagged with multi-modal-learning .

https://github.com/lyuchenyang/macaw-llm

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks

Last synced: 19 Dec 2024

https://github.com/lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks

Last synced: 07 Nov 2024

https://github.com/nvlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-and-language vision-language-model vqa

Last synced: 15 Dec 2024

https://github.com/lucidrains/x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

artificial-intelligence contrastive-learning deep-learning multi-modal-learning zero-shot-learning

Last synced: 21 Dec 2024

https://github.com/dmitryryumin/cvpr-2023-24-papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

action-recognition autonomous-driving biometrics computer-vision cvpr cvpr2023 cvpr2024 datasets deep-learning face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition scene-analysis segmentation self-supervised-learning shape-analysis video-synthesis

Last synced: 15 Dec 2024

https://github.com/qizekun/ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3d-point-clouds multi-modal-learning representation-learning self-supervised-learning

Last synced: 28 Oct 2024

https://github.com/rentainhe/trar-vqa

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2

Last synced: 07 Nov 2024

https://github.com/ttgeng233/UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

audio-visual-events audio-visual-learning multi-modal-learning

Last synced: 16 Nov 2024

https://github.com/kyegomez/neva

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics

Last synced: 09 Nov 2024

https://github.com/kyegomez/megavit

The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"

artificial-intelligence computer-vision gpt4 multi-modal multi-modal-fusion multi-modal-learning vision-and-language vision-transformer

Last synced: 09 Nov 2024

https://github.com/sayakpaul/multimodal-entailment-baseline

This repository shows how to implement a basic model for multimodal entailment.

entailment keras multi-modal-learning tensorflow

Last synced: 23 Oct 2024

https://github.com/agora-lab-ai/ekr

Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.

artificial-intelligence chroma embeddings multi-modal-learning multimodal pinecone vectordatabase

Last synced: 10 Nov 2024

https://github.com/jianzhnie/multimodaltransformers

lmmtoolkit is a toolkit for Multi-Modal Learning

image-text multi-modal-learning text-image text-to-video

Last synced: 06 Dec 2024

https://github.com/lyuchenyang/semantic-aware-videoqa

Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"

artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering

Last synced: 21 Nov 2024

https://github.com/lyuchenyang/efficient-videoqa

Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"

artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering

Last synced: 21 Nov 2024

https://github.com/liu42/contrastive

项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型

bert cnn computer-vision contrastive-learning deep-learning image-text-retrieval image-text-search multi-modal multi-modal-learning nlp pytorch roberta transformers

Last synced: 13 Dec 2024

https://github.com/ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification

This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification".

deep-learning fusion-techniques metadata metadata-fusion multi-modal multi-modal-learning wild-animal-classification wild-life-monitoring

Last synced: 17 Nov 2024

https://github.com/stifler7/multi-modal-learning-for-image-and-text-analysis

Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.

machine-learning multi-modal-learning

Last synced: 01 Dec 2024

https://github.com/amazon-science/contrastive_emc2

Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"

contrastive-learning deep-neural-networks machine-learning machine-learning-algorithms mcmc-sampling multi-modal multi-modal-learning

Last synced: 12 Nov 2024

https://github.com/loharmurtaza/fog_detection

This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"

accelerometer detection eeg emg f1-score freezing-of-gait gyroscope machine-learning mfcc multi-modal-learning rf sensitivity skin-conductance specificity svm

Last synced: 26 Sep 2024

https://github.com/loharmurtaza/fog_detection_subject_dependent

This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"

accelerometer detection eeg emg f1-score freezing-of-gait gyroscope machine-learning mfcc multi-modal-learning rf sensitivity skin-conductance specificity svm

Last synced: 20 Dec 2024