Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with multimodal-deep-learning

A curated list of projects in awesome lists tagged with multimodal-deep-learning .

https://github.com/jrzaurin/pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

deep-learning images model-hub multimodal-deep-learning python pytorch pytorch-cv pytorch-nlp pytorch-tabular-data pytorch-transformers tabular-data text

Last synced: 18 Dec 2024

https://github.com/dwctod/cvpr2024-papers-with-code-demo

收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

computer-vision cvpr cvpr2021 cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything segmentation

Last synced: 30 Nov 2024

https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo

收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

computer-vision cvpr cvpr2021 cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything segmentation

Last synced: 01 Nov 2024

https://github.com/kyegomez/bitnet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning

Last synced: 14 Dec 2024

https://github.com/kyegomez/BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning

Last synced: 06 Nov 2024

https://github.com/declare-lab/multimodal-deep-learning

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

multimodal-deep-learning multimodal-interactions multimodal-learning multimodal-sentiment-analysis

Last synced: 08 Nov 2024

https://github.com/sail-sg/CLoT

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

association humor-generation large-language-models leap-of-thought multimodal-deep-learning

Last synced: 09 Nov 2024

https://github.com/kyegomez/navit

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

attention-mechanism clip gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality vit

Last synced: 16 Dec 2024

https://github.com/mahmoodlab/mcat

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021

early-fusion genomics mahmoodlab mcat multimodal multimodal-deep-learning multimodal-fusion pathology

Last synced: 20 Dec 2024

https://github.com/declare-lab/multimodal-infomax

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

multimodal-deep-learning multimodal-fusion multimodal-sentiment-analysis

Last synced: 19 Dec 2024

https://github.com/mahmoodlab/MCAT

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021

early-fusion genomics mahmoodlab mcat multimodal multimodal-deep-learning multimodal-fusion pathology

Last synced: 13 Nov 2024

https://github.com/kyegomez/pali3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

artificial-intelligence autogpt gpt4 machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality

Last synced: 20 Dec 2024

https://github.com/LeapLabTHU/Pseudo-Q

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

computer-vision cvpr2022 deep-learning multimodal-deep-learning pytorch vision-and-language visual-grounding

Last synced: 04 Nov 2024

https://github.com/om-ai-lab/vl-checklist

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]

evaluation-metrics multimodal-deep-learning vision-and-language

Last synced: 16 Dec 2024

https://github.com/shamanez/self-supervised-embedding-fusion-transformer

The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

bert emotion-recognition multimodal-deep-learning multimodal-emotion-recognition multimodal-sentiment-analysis self-supervised-learning

Last synced: 06 Dec 2024

https://github.com/dirtyharrylyl/dj-rn

As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".

3d-reconstruction human-object-interaction multimodal-deep-learning

Last synced: 18 Nov 2024

https://github.com/cambridgeltl/visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.

computer-vision multimodal-deep-learning nlp vision-and-language

Last synced: 04 Nov 2024

https://github.com/kyegomez/pali

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

artificial-intelligence gpt4 machine-learning multimodal multimodal-deep-learning multimodality

Last synced: 19 Dec 2024

https://github.com/vision-cair/3dcompat-v2

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition

3d compositional-learning computer-vision deep-learning multimodal-deep-learning

Last synced: 17 Dec 2024

https://github.com/kyegomez/kosmos2.5

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

attention attention-is-all-you-need gpt3 gpt4 kosmos multi-modality multimodal multimodal-deep-learning opensource

Last synced: 16 Dec 2024

https://github.com/showlab/lova3

(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment

benchmark data-asse multimodal-deep-learning multimodal-large-language-models visual-question-answering visual-question-generation

Last synced: 17 Nov 2024

https://github.com/declare-lab/bbfn

This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

huggingface-transformers multimodal-deep-learning multimodal-representation multimodal-sentiment-analysis

Last synced: 08 Nov 2024

https://github.com/sutdcv/SUTD-TrafficQA

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

annotations cvpr cvpr2021 dataset multimodal multimodal-deep-learning paper traffic-events video-qa video-reasoning vqa vqa-dataset

Last synced: 27 Oct 2024

https://github.com/naver/artemis

Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)

image-retrieval multimodal-deep-learning multimodal-retrieval

Last synced: 08 Nov 2024

https://ai4ce.github.io/MARS/

[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

3dgs collaborative-perception coperception cvpr2024 dataset multiagent multimodal-deep-learning nerf self-driving

Last synced: 08 Nov 2024

https://github.com/declare-lab/llm-puzzletest

This repository is maintained to release dataset and models for multimodal puzzle reasoning.

gemini gemini-pro gpt-4 language-model large-language-models llm llms multimodal-deep-learning multimodal-learning

Last synced: 08 Nov 2024

https://github.com/declare-lab/msa-robustness

NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis

multimodal-deep-learning multimodal-sentiment-analysis robustness-analysis

Last synced: 08 Nov 2024

https://github.com/yuhui-zh15/drml

Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)

computer-vision iclr2023 machine-learning multimodal-deep-learning natural-language-processing representation-learning

Last synced: 08 Nov 2024

https://github.com/ai4ce/MARS

[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

3dgs collaborative-perception coperception cvpr2024 dataset multiagent multimodal-deep-learning nerf self-driving

Last synced: 27 Oct 2024

https://github.com/kyegomez/multimodalcrossattn

The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"

artificial-intelligence attention attention-is-all-you-need attention-mechanism attn gpt4 multimodal multimodal-deep-learning

Last synced: 09 Nov 2024

https://github.com/declare-lab/mm-align

[EMNLP 2022] This repository contains the official implementation of the paper "MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences"

machine-learning multimodal-deep-learning multimodal-sentiment-analysis natural-language-processing optimal-transport

Last synced: 08 Nov 2024

https://github.com/declare-lab/m2h2-dataset

This repository contains the dataset and baselines explained in the paper: M2H2: A Multimodal Multiparty Hindi Dataset For HumorRecognition in Conversations

emotion-recognition-in-conversation humor-detection multimodal-deep-learning multimodal-fusion

Last synced: 08 Nov 2024

https://github.com/kyegomez/pegasus

PegasusX: The Future of Multimodal Embeddings 🦄 🦄

artificial-intelligence embeddings gpt4 multimodal multimodal-deep-learning openai

Last synced: 16 Nov 2024

https://github.com/prithivirajdamodaran/vision-language-modelling-series

Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations

multimodal-deep-learning multimodal-interactions multimodal-representation vision-and-language vision-and-language-navigation vision-and-language-pre-training

Last synced: 28 Nov 2024

https://github.com/laminetourelab/mogonet

MOGONET (Multi-Omics Graph cOnvolutional NETworks) is multi-omics data integrative analysis framework for classification tasks in biomedical applications.

deep-learning graph-convolutional-networks machine-learning mogonet multi-omics multi-omics-integration multimodal-deep-learning pytorch

Last synced: 08 Nov 2024

https://github.com/choyingw/gais-net

CVPR 2020 Workshop on Scalability in Autonomous Driving: GAIS-Net: Geometry-Aware Instance Segmentation with Disparity Maps

3d autonomous-vehicles computer-vision cvpr2020 deep-neural-networks geometry-processing instance-segmentation mask-rcnn multimodal-deep-learning scalability stereo-vision

Last synced: 06 Nov 2024

https://github.com/adrianbzg/hyperbert

Code for "HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs"

deep-learning hypergraphs language-model multimodal-deep-learning

Last synced: 05 Nov 2024

https://github.com/kyegomez/mmca-mgqa

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

artificial-intelligence attention attention-is-all-you-need attention-mechanism gpt4 multimodal multimodal-deep-learning multimodality

Last synced: 09 Nov 2024

https://github.com/janteichertkluge/dmlsim

This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Python for Simulation and Case Studies.

beit bert case-study causal causal-inference causal-machine-learning deep-learning dgp double-machine-learning doubleml machine-learning multi-modal multimodal multimodal-deep-learning neural-network simulation transformer transformers

Last synced: 14 Oct 2024

https://github.com/jiayuww/spatialeval

[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs

claude foundation-models gemini gpt-4o gpt-4v large-language-models llama3 machine-learning multimodal-deep-learning reasoning spatial-reasoning vision-language-models

Last synced: 03 Dec 2024

https://github.com/chikap421/videosam

This repository accompanies the paper "VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation"

cnn computer-vision multimodal-deep-learning segment-anything-model vision-transformers

Last synced: 30 Nov 2024

https://github.com/ashutosh1919/data2vec-pytorch

Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.

audio-machine-learning computer-vision data2vec deep-learning embedding-models multimodal-deep-learning nlp pytorch self-supervised-learning

Last synced: 27 Oct 2024

https://github.com/kyegomez/odin

SOTA Classification at scale for UAVs, Drones, and much more

computer-vision multimodal multimodal-data multimodal-deep-learning swarm-intelligence

Last synced: 09 Nov 2024

https://github.com/adrianbzg/sfavel

Code for "Unsupervised Pretraining for Fact Verification by Language Model Distillation" (ICLR 2024)

deep-learning knowledge-distillation knowledge-graphs language-model multimodal-deep-learning natural-language-processing self-supervised-learning

Last synced: 05 Nov 2024

https://github.com/dnth/postgresql-multimodal-retrieval

Vector/Hybrid Search & Retrieval on PostgreSQL database using Vision Language Model.

huggingface-datasets multimodal-deep-learning pgvector postgresql retrieval search-engine vector-database vision-language-model

Last synced: 06 Dec 2024

https://github.com/gustavocidornelas/fused-multimodal-emotion

Multimodal emotion recognition using lexico-acoustic language descriptions

deep-learning emotion-recognition multimodal-deep-learning

Last synced: 10 Nov 2024

https://github.com/frankaging/generative-physics-inference

Slip or Not? Unsupervised Learning to Understand Physical Scene Using Multimodal Variational Physics Inference Network

cognitive-services generative-model intuitive-physics multimodal-deep-learning physics-simulation variational-autoencoder variational-inference

Last synced: 11 Nov 2024

https://github.com/iamdanialkamali/memotionanalysis

Meme Sentiment Analysis SemEval 2020 Task 9

keras multimodal-deep-learning tensorflow

Last synced: 22 Nov 2024

https://github.com/dermatologist/kedro-tf-text

Kedro pipelines for preprocessing text and tabular data for multi-modal ML in TensorFlow.

bert gpt hacktoberfest healthcare kedro medical multimodal-deep-learning nlp-machine-learning

Last synced: 21 Dec 2024

https://github.com/chikap421/mseg_vcuq

This repository accompanies the paper "MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data"

cnn computer-vision multimodal-deep-learning uncertainty-quantification vision-transformer

Last synced: 30 Nov 2024

https://github.com/ibnaleem/mikael

the open-sourced repository for Mikael, a Discord chatbot trained on Mistral and LLaVA language models

artificial-intelligence chatbot discord-bot discord-py gpt-4 large-language-models llava mistral mistral-7b mistral-ai multimodal multimodal-deep-learning

Last synced: 15 Dec 2024

https://github.com/eva-kaushik/emkgcn-multimodal-music-recommender

The `MKGCN` class, coupled with the Spotify API, orchestrates a multi-modal knowledge graph convolutional network to enhance music recommendation systems by integrating user interaction data and diverse music modalities.

emkgcn multimodal-deep-learning music-recommendation-system spotify-api spotipy-library

Last synced: 05 Nov 2024

https://github.com/aeternalis-ingenium/v4vision-poc-backend

API to infer automated disease detection and report generation from medical images.

llm machine-learning med-tech multimodal-deep-learning multimodality radiology software-engineering

Last synced: 14 Nov 2024

https://github.com/macabdul9/torchmm

PyTorch Data loaders and abstraction for multi-modal data.

computer-vision multimodal-deep-learning natural-language-processing python pytorch speech-processing

Last synced: 01 Oct 2024

https://github.com/etienne-bobo/skimlit-nlp

The purpose of this project is to build an NLP model to make reading medical abtracts easier.

keras-tensorflow multimodal-deep-learning nlp tensorflow2

Last synced: 11 Nov 2024

https://github.com/vasugi2003/fusion-ai---multimodal-persuvasiveness-prediction

Developed a system to predict persuasiveness using multi-modal data (text, images, audio). Utilized BERT for text embeddings, ResNet for image features, and Librosa for audio analysis. Fused data from all modalities for enhanced prediction accuracy.

ai bert-model fusion librosa multimodal-deep-learning python resnet-50 tensorflow

Last synced: 05 Nov 2024

https://github.com/slinusc/path-vqa-blip

Fine-tuning BLIP for pathological visual question answering.

blip multimodal-deep-learning pathology

Last synced: 10 Nov 2024