Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with multimodal-deep-learning
A curated list of projects in awesome lists tagged with multimodal-deep-learning .
https://github.com/salesforce/lavis
LAVIS - A One-stop Library for Language-Vision Intelligence
deep-learning deep-learning-library image-captioning multimodal-datasets multimodal-deep-learning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering
Last synced: 20 Dec 2024
https://github.com/salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
deep-learning deep-learning-library image-captioning multimodal-datasets multimodal-deep-learning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering
Last synced: 25 Oct 2024
https://github.com/kimmeen/time-llm
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
cross-modal-learning cross-modality deep-learning language-model large-language-models machine-learning multimodal-deep-learning multimodal-time-series prompt-tuning time-series time-series-analysis time-series-forecast time-series-forecasting
Last synced: 19 Dec 2024
https://github.com/KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
cross-modal-learning cross-modality deep-learning language-model large-language-models machine-learning multimodal-deep-learning multimodal-time-series prompt-tuning time-series time-series-analysis time-series-forecast time-series-forecasting
Last synced: 06 Nov 2024
https://github.com/jrzaurin/pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
deep-learning images model-hub multimodal-deep-learning python pytorch pytorch-cv pytorch-nlp pytorch-tabular-data pytorch-transformers tabular-data text
Last synced: 18 Dec 2024
https://github.com/dwctod/cvpr2024-papers-with-code-demo
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
computer-vision cvpr cvpr2021 cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything segmentation
Last synced: 30 Nov 2024
https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
computer-vision cvpr cvpr2021 cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything segmentation
Last synced: 01 Nov 2024
https://github.com/kyegomez/bitnet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning
Last synced: 14 Dec 2024
https://github.com/kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning
Last synced: 06 Nov 2024
https://github.com/alibabaresearch/advancedliteratemachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
artificial-intelligence computer-vision document document-analysis document-intelligence document-recognition document-understanding documentai end-to-end-ocr multimodal multimodal-deep-learning ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language vision-language-model vision-language-transformer
Last synced: 31 Oct 2024
https://github.com/AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
artificial-intelligence computer-vision document document-analysis document-intelligence document-recognition document-understanding documentai end-to-end-ocr multimodal multimodal-deep-learning ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language vision-language-model vision-language-transformer
Last synced: 07 Nov 2024
https://github.com/declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
multimodal-deep-learning multimodal-interactions multimodal-learning multimodal-sentiment-analysis
Last synced: 08 Nov 2024
https://github.com/omriav/blended-latent-diffusion
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
computer-vision deep-learning diffusion diffusion-models generative-model image-generation multimodal multimodal-deep-learning pytorch text-driven-editing text-guided-manipulation text-to-image text-to-image-synthesis
Last synced: 31 Oct 2024
https://github.com/theislab/scarches
Reference mapping for single-cell genomics
batch-correction data-integration deep-learning human-cell-atlas multimodal-deep-learning multiomics rna-seq-analysis scrna-seq single-cell single-cell-genomics
Last synced: 28 Nov 2024
https://github.com/MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering
Last synced: 08 Nov 2024
https://github.com/fcakyon/content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
content-moderation content-ratings genre-classification movie-content-filter movie-trailer multimodal-deep-learning nsfw-recognition nudity-detection profanity-detection violence-detection
Last synced: 10 Dec 2024
https://github.com/soujanyaporia/mustard
Multimodal Sarcasm Detection Dataset
multimodal-deep-learning multimodal-interactions sarcasm sarcasm-detection
Last synced: 30 Nov 2024
https://github.com/sail-sg/CLoT
CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".
association humor-generation large-language-models leap-of-thought multimodal-deep-learning
Last synced: 09 Nov 2024
https://github.com/westlake-repl/recommendation-systems-without-explicit-id-features-a-literature-review
Paper List of Pre-trained Foundation Recommender Models
chatgpt chatgpt3 chatgpt4rec cross-domain-recommendation cross-domainrecommendation foundation-model gpt4rec language-model large-language-model llm llm-recommendation llm4rec multimodal multimodal-deep-learning multimodalrecommendation pre-training recommendation-system recommender-system transfer-learning transferable
Last synced: 09 Nov 2024
https://github.com/dwctod/eccv2022-papers-with-code-demo
收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
ai computer-vision cv dataset diffusion eccv eccv2022 face-recognition image-segmentation multimodal-deep-learning nerf objection-detection vision-transformer
Last synced: 21 Nov 2024
https://github.com/kyegomez/Med-PaLM
Towards Generalist Biomedical AI
biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource
Last synced: 27 Oct 2024
https://github.com/kyegomez/med-palm
Towards Generalist Biomedical AI
biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource
Last synced: 15 Dec 2024
https://github.com/kyegomez/navit
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
attention-mechanism clip gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality vit
Last synced: 16 Dec 2024
https://github.com/mahmoodlab/mcat
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
early-fusion genomics mahmoodlab mcat multimodal multimodal-deep-learning multimodal-fusion pathology
Last synced: 20 Dec 2024
https://github.com/declare-lab/multimodal-infomax
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
multimodal-deep-learning multimodal-fusion multimodal-sentiment-analysis
Last synced: 19 Dec 2024
https://github.com/mahmoodlab/MCAT
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
early-fusion genomics mahmoodlab mcat multimodal multimodal-deep-learning multimodal-fusion pathology
Last synced: 13 Nov 2024
https://github.com/kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
agora artficial-intelligence autogpt chain-of-thought chatgpt deep-learning deep-learning-algorithms multi-modal-fusion multi-modality multimodal-deep-learning prompt-engineering reinforcement-learning tree-of-thoughts
Last synced: 18 Dec 2024
https://github.com/kyegomez/pali3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
artificial-intelligence autogpt gpt4 machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality
Last synced: 20 Dec 2024
https://github.com/LeapLabTHU/Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
computer-vision cvpr2022 deep-learning multimodal-deep-learning pytorch vision-and-language visual-grounding
Last synced: 04 Nov 2024
https://github.com/om-ai-lab/vl-checklist
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
evaluation-metrics multimodal-deep-learning vision-and-language
Last synced: 16 Dec 2024
https://github.com/shamanez/self-supervised-embedding-fusion-transformer
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
bert emotion-recognition multimodal-deep-learning multimodal-emotion-recognition multimodal-sentiment-analysis self-supervised-learning
Last synced: 06 Dec 2024
https://github.com/kyegomez/swarms-pytorch
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
artificial-intelligence firefly gpt4 hivemind machine-learning multimodal multimodal-deep-learning multimodality networks neural-network pso swarm-algorithm swarm-intelligence swarm-optimization swarm-robotics swarms swarms-of-agents
Last synced: 20 Dec 2024
https://github.com/dirtyharrylyl/dj-rn
As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".
3d-reconstruction human-object-interaction multimodal-deep-learning
Last synced: 18 Nov 2024
https://github.com/cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
computer-vision multimodal-deep-learning nlp vision-and-language
Last synced: 04 Nov 2024
https://github.com/kyegomez/pali
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
artificial-intelligence gpt4 machine-learning multimodal multimodal-deep-learning multimodality
Last synced: 19 Dec 2024
https://github.com/vision-cair/3dcompat-v2
3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
3d compositional-learning computer-vision deep-learning multimodal-deep-learning
Last synced: 17 Dec 2024
https://github.com/yuanze-lin/Learnable_Regions
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
aigc diffusion-model diffusion-models generative-model multimodal-deep-learning text-driven-editing text-driven-image-editing text-driven-image-manipulation text-driven-manipulation text-image
Last synced: 30 Oct 2024
https://github.com/kyegomez/kosmos2.5
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
attention attention-is-all-you-need gpt3 gpt4 kosmos multi-modality multimodal multimodal-deep-learning opensource
Last synced: 16 Dec 2024
https://github.com/showlab/lova3
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark data-asse multimodal-deep-learning multimodal-large-language-models visual-question-answering visual-question-generation
Last synced: 17 Nov 2024
https://github.com/declare-lab/bbfn
This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis
huggingface-transformers multimodal-deep-learning multimodal-representation multimodal-sentiment-analysis
Last synced: 08 Nov 2024
https://github.com/sutdcv/SUTD-TrafficQA
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
annotations cvpr cvpr2021 dataset multimodal multimodal-deep-learning paper traffic-events video-qa video-reasoning vqa vqa-dataset
Last synced: 27 Oct 2024
https://github.com/naver/artemis
Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
image-retrieval multimodal-deep-learning multimodal-retrieval
Last synced: 08 Nov 2024
https://github.com/aehrc/cvt2distilgpt2
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
chest-xray-imaging distilgpt2 gpt-2 huggingface-transformers image-captioning medical-image-analysis mimic-cxr multimodal multimodal-deep-learning pytorch pytorch-lightning vision-transformer
Last synced: 25 Nov 2024
https://github.com/aimotive/aimotive_dataset
aiMotive public dataset
3d-object-detection autonomous-driving dataset multimodal-deep-learning object-tracking representation-learning
Last synced: 28 Oct 2024
https://ai4ce.github.io/MARS/
[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
3dgs collaborative-perception coperception cvpr2024 dataset multiagent multimodal-deep-learning nerf self-driving
Last synced: 08 Nov 2024
https://github.com/choyingw/cfcnet
NeurIPS 2019: Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion
3d canonical-correlation-analysis computer-vision deep-neural-networks depth-completion depth-estimation multimodal-deep-learning neurips-2019 nips-2019
Last synced: 06 Nov 2024
https://github.com/junweiliang/fvta_memexqa
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
memex-question-answering memexqa-dataset multimodal-datasets multimodal-deep-learning multimodal-representation vision-and-language visual-question-answering
Last synced: 08 Nov 2024
https://github.com/42jaylonw/shifu
Lightweight Isaac Gym Environment Builder
isaacgym multimodal-deep-learning reinforcement-learning robot-learning robotics
Last synced: 03 Nov 2024
https://github.com/visinf/lnfmm
Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)
conditional-vae generative-models image-to-text latent-variable-models multimodal-deep-learning normalizing-flows text-to-image vision-and-language
Last synced: 28 Oct 2024
https://github.com/declare-lab/llm-puzzletest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
gemini gemini-pro gpt-4 language-model large-language-models llm llms multimodal-deep-learning multimodal-learning
Last synced: 08 Nov 2024
https://github.com/declare-lab/msa-robustness
NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis
multimodal-deep-learning multimodal-sentiment-analysis robustness-analysis
Last synced: 08 Nov 2024
https://github.com/yuhui-zh15/drml
Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)
computer-vision iclr2023 machine-learning multimodal-deep-learning natural-language-processing representation-learning
Last synced: 08 Nov 2024
https://github.com/ai4ce/MARS
[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
3dgs collaborative-perception coperception cvpr2024 dataset multiagent multimodal-deep-learning nerf self-driving
Last synced: 27 Oct 2024
https://github.com/kyegomez/multimodalcrossattn
The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"
artificial-intelligence attention attention-is-all-you-need attention-mechanism attn gpt4 multimodal multimodal-deep-learning
Last synced: 09 Nov 2024
https://github.com/declare-lab/mm-align
[EMNLP 2022] This repository contains the official implementation of the paper "MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences"
machine-learning multimodal-deep-learning multimodal-sentiment-analysis natural-language-processing optimal-transport
Last synced: 08 Nov 2024
https://github.com/asnelt/mmae
Package for Multimodal Autoencoders in TensorFlow / Keras
autoencoder autoencoders bregman-distance deep-learning keras keras-models keras-tensorflow multimodal-deep-learning multimodal-learning tensorflow
Last synced: 11 Oct 2024
https://github.com/declare-lab/m2h2-dataset
This repository contains the dataset and baselines explained in the paper: M2H2: A Multimodal Multiparty Hindi Dataset For HumorRecognition in Conversations
emotion-recognition-in-conversation humor-detection multimodal-deep-learning multimodal-fusion
Last synced: 08 Nov 2024
https://github.com/kyegomez/pegasus
PegasusX: The Future of Multimodal Embeddings 🦄 🦄
artificial-intelligence embeddings gpt4 multimodal multimodal-deep-learning openai
Last synced: 16 Nov 2024
https://github.com/frankaging/multimodal-transformer
Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset
attention-model emotion emotion-recognition lstm-neural-networks multimodal-deep-learning narrative natural-language-processing storytelling transformer-architecture
Last synced: 11 Nov 2024
https://github.com/prithivirajdamodaran/vision-language-modelling-series
Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations
multimodal-deep-learning multimodal-interactions multimodal-representation vision-and-language vision-and-language-navigation vision-and-language-pre-training
Last synced: 28 Nov 2024
https://github.com/laminetourelab/mogonet
MOGONET (Multi-Omics Graph cOnvolutional NETworks) is multi-omics data integrative analysis framework for classification tasks in biomedical applications.
deep-learning graph-convolutional-networks machine-learning mogonet multi-omics multi-omics-integration multimodal-deep-learning pytorch
Last synced: 08 Nov 2024
https://github.com/kritiksoman/multimodal
Listen. Write. Speak. Read. Think.
multimodal-deep-learning named-entity-recognition natural-language-processing nlp-machine-learning question-answering sentiment-analysis speech-recognition text-generation text-to-speech
Last synced: 12 Nov 2024
https://github.com/choyingw/gais-net
CVPR 2020 Workshop on Scalability in Autonomous Driving: GAIS-Net: Geometry-Aware Instance Segmentation with Disparity Maps
3d autonomous-vehicles computer-vision cvpr2020 deep-neural-networks geometry-processing instance-segmentation mask-rcnn multimodal-deep-learning scalability stereo-vision
Last synced: 06 Nov 2024
https://github.com/kyegomez/gen2
Implementation of "Text driven video generation" in pytorch
artificial-intelligence gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality stablediffusion texttovideo
Last synced: 09 Nov 2024
https://github.com/adrianbzg/hyperbert
Code for "HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs"
deep-learning hypergraphs language-model multimodal-deep-learning
Last synced: 05 Nov 2024
https://github.com/kyegomez/celestial-1
Omni-Modality Processing, Understanding, and Generation
attention attention-is-all-you-need attention-mechanisms gpt-4 gpt4 multi-modal multimodal multimodal-deep-learning multimodality omnimodal openai
Last synced: 09 Nov 2024
https://github.com/kyegomez/mmca-mgqa
Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention
artificial-intelligence attention attention-is-all-you-need attention-mechanism gpt4 multimodal multimodal-deep-learning multimodality
Last synced: 09 Nov 2024
https://github.com/janteichertkluge/dmlsim
This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Python for Simulation and Case Studies.
beit bert case-study causal causal-inference causal-machine-learning deep-learning dgp double-machine-learning doubleml machine-learning multi-modal multimodal multimodal-deep-learning neural-network simulation transformer transformers
Last synced: 14 Oct 2024
https://github.com/jiayuww/spatialeval
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
claude foundation-models gemini gpt-4o gpt-4v large-language-models llama3 machine-learning multimodal-deep-learning reasoning spatial-reasoning vision-language-models
Last synced: 03 Dec 2024
https://github.com/chikap421/videosam
This repository accompanies the paper "VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation"
cnn computer-vision multimodal-deep-learning segment-anything-model vision-transformers
Last synced: 30 Nov 2024
https://github.com/ashutosh1919/data2vec-pytorch
Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.
audio-machine-learning computer-vision data2vec deep-learning embedding-models multimodal-deep-learning nlp pytorch self-supervised-learning
Last synced: 27 Oct 2024
https://github.com/kyegomez/odin
SOTA Classification at scale for UAVs, Drones, and much more
computer-vision multimodal multimodal-data multimodal-deep-learning swarm-intelligence
Last synced: 09 Nov 2024
https://github.com/sitamgithub-msit/picq
PicQ: Demo for MiniCPM-V 2.6 to answer questions about images using natural language.
artificial-intelligence gradio gradio-interface huggingface-spaces huggingface-transformers minicpm-v multilingual-models multimodal-data multimodal-deep-learning question-answering
Last synced: 30 Nov 2024
https://github.com/adrianbzg/sfavel
Code for "Unsupervised Pretraining for Fact Verification by Language Model Distillation" (ICLR 2024)
deep-learning knowledge-distillation knowledge-graphs language-model multimodal-deep-learning natural-language-processing self-supervised-learning
Last synced: 05 Nov 2024
https://github.com/dnth/postgresql-multimodal-retrieval
Vector/Hybrid Search & Retrieval on PostgreSQL database using Vision Language Model.
huggingface-datasets multimodal-deep-learning pgvector postgresql retrieval search-engine vector-database vision-language-model
Last synced: 06 Dec 2024
https://github.com/deepmancer/deepmancer
"When in doubt, use brute force." - Ken Thompson
computer-science computer-vision deep-learning ml-engineering multimodal-deep-learning natural-language-processing software-engineering
Last synced: 13 Dec 2024
https://github.com/gustavocidornelas/fused-multimodal-emotion
Multimodal emotion recognition using lexico-acoustic language descriptions
deep-learning emotion-recognition multimodal-deep-learning
Last synced: 10 Nov 2024
https://github.com/sitamgithub-msit/vidiqa
VidiQA: Demo for MiniCPM-V 2.6 to answer questions about videos using natural language.
gradio gradio-interface huggingface-spaces huggingface-transformers minicpm-v multilingual-models multimodal-data multimodal-deep-learning python question-answering
Last synced: 30 Nov 2024
https://github.com/frankaging/generative-physics-inference
Slip or Not? Unsupervised Learning to Understand Physical Scene Using Multimodal Variational Physics Inference Network
cognitive-services generative-model intuitive-physics multimodal-deep-learning physics-simulation variational-autoencoder variational-inference
Last synced: 11 Nov 2024
https://github.com/iamdanialkamali/memotionanalysis
Meme Sentiment Analysis SemEval 2020 Task 9
keras multimodal-deep-learning tensorflow
Last synced: 22 Nov 2024
https://github.com/dermatologist/kedro-tf-text
Kedro pipelines for preprocessing text and tabular data for multi-modal ML in TensorFlow.
bert gpt hacktoberfest healthcare kedro medical multimodal-deep-learning nlp-machine-learning
Last synced: 21 Dec 2024
https://github.com/chikap421/mseg_vcuq
This repository accompanies the paper "MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data"
cnn computer-vision multimodal-deep-learning uncertainty-quantification vision-transformer
Last synced: 30 Nov 2024
https://github.com/ibnaleem/mikael
the open-sourced repository for Mikael, a Discord chatbot trained on Mistral and LLaVA language models
artificial-intelligence chatbot discord-bot discord-py gpt-4 large-language-models llava mistral mistral-7b mistral-ai multimodal multimodal-deep-learning
Last synced: 15 Dec 2024
https://github.com/darrylnurse/viewvie
Movie detection application.
computer-vision embeddings express ffmpeg google-cloud-platform multimodal-deep-learning nodejs python react react-router vector-database
Last synced: 05 Nov 2024
https://github.com/eva-kaushik/emkgcn-multimodal-music-recommender
The `MKGCN` class, coupled with the Spotify API, orchestrates a multi-modal knowledge graph convolutional network to enhance music recommendation systems by integrating user interaction data and diverse music modalities.
emkgcn multimodal-deep-learning music-recommendation-system spotify-api spotipy-library
Last synced: 05 Nov 2024
https://github.com/forestsking/awesome-multimodal-time-series
A curated list of paper, code, data, and other resources focus on multimodal time series analysis.
awesome awesome-list awesome-resources multimodal-deep-learning multimodal-time-series multimodality time-series time-series-analysis time-series-forecasting
Last synced: 14 Nov 2024
https://github.com/anne-andresen/multi-modal-cuda-c-gan
Raw C/cuda implementation of 3d GAN
3d 3d-models attention-mechanism c cross-attention cross-attention-c cuda gan gan-models low-level-programming medical-imaging multimodal-deep-learning pytorch transformer-pytorch transformers transformers-c
Last synced: 05 Nov 2024
https://github.com/nicolay-r/nicolay-r
This is my personal news list updates in Information Retrieval domain
information-retrieval language-model large-language-models multimodal-deep-learning nlp relation-extraction tensorflow torch tranformers
Last synced: 19 Dec 2024
https://github.com/sathya-ml/multimodal-vrnn-vae
A PyTorch implementation of multimodal VRNN and VAE.
conditional-generation generative-model multimodal multimodal-deep-learning multimodal-learning vae vae-implementation vae-pytorch variational-autoencoder vrnn
Last synced: 10 Nov 2024
https://github.com/aeternalis-ingenium/v4vision-poc-backend
API to infer automated disease detection and report generation from medical images.
llm machine-learning med-tech multimodal-deep-learning multimodality radiology software-engineering
Last synced: 14 Nov 2024
https://github.com/ahmdtaha/distributed_sigmoid_loss
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
contrastive-learning distributed-data-parallel multimodal-deep-learning python3 pytorch self-supervised-learning unsupervised-learning vision-and-language vision-language vision-language-pretraining vision-transformer
Last synced: 06 Nov 2024
https://github.com/macabdul9/torchmm
PyTorch Data loaders and abstraction for multi-modal data.
computer-vision multimodal-deep-learning natural-language-processing python pytorch speech-processing
Last synced: 01 Oct 2024
https://github.com/etienne-bobo/skimlit-nlp
The purpose of this project is to build an NLP model to make reading medical abtracts easier.
keras-tensorflow multimodal-deep-learning nlp tensorflow2
Last synced: 11 Nov 2024
https://github.com/vasugi2003/fusion-ai---multimodal-persuvasiveness-prediction
Developed a system to predict persuasiveness using multi-modal data (text, images, audio). Utilized BERT for text embeddings, ResNet for image features, and Librosa for audio analysis. Fused data from all modalities for enhanced prediction accuracy.
ai bert-model fusion librosa multimodal-deep-learning python resnet-50 tensorflow
Last synced: 05 Nov 2024
https://github.com/slinusc/path-vqa-blip
Fine-tuning BLIP for pathological visual question answering.
blip multimodal-deep-learning pathology
Last synced: 10 Nov 2024