Projects in Awesome Lists tagged with multimodality
A curated list of projects in awesome lists tagged with multimodality .
https://github.com/lucidrains/big-sleep
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
artificial-intelligence deep-learning generative-adversarial-networks multimodality text-to-image
Last synced: 14 May 2025
https://github.com/hymie122/RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
aigc diffusion-models llm multimodality rag survey
Last synced: 20 Apr 2025
https://github.com/aidc-ai/ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
chatbot llama3 multimodal multimodal-large-language-models multimodality qwen vision-language-learning vision-language-model
Last synced: 17 Nov 2025
https://github.com/PreferredAI/cornac
A Comparative Framework for Multimodal Recommender Systems
collaborative-filtering matrix-factorization multimodal-learning multimodality recommendation-algorithms recommendation-engine recommendation-system recommender-system
Last synced: 18 Jul 2025
https://github.com/ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
activitynet clip didemo lsmdc msrvtt msvd multimodal multimodal-learning multimodality ranking retrieval retrieval-model search video-clip-retrieval video-text-retrieval
Last synced: 03 Apr 2025
https://github.com/fnzhan/Generative-AI
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
aigc diffusion-model gans multimodality nerfs
Last synced: 03 Apr 2025
https://github.com/aimclub/FEDOT
Automated modeling and machine learning framework FEDOT
automated-machine-learning automation automl evolutionary-algorithms fedot genetic-programming hyperparameter-optimization machine-learning multimodality parameter-tuning structural-learning
Last synced: 18 Apr 2025
https://github.com/vita-mllm/woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality
Last synced: 15 May 2025
https://github.com/bradyfu/woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality
Last synced: 31 Mar 2025
https://github.com/VITA-MLLM/Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality
Last synced: 09 May 2025
https://github.com/BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
ai ai-agent ai-agents-framework computer-control cradle foundation-agent gcc general-computer-control generative-ai grounding large-language-models llm lmm multimodality personoid vision-language-model vlm
Last synced: 07 May 2025
https://github.com/microsoft/llm2clip
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
clip fundation-models multimodality
Last synced: 11 Apr 2025
https://github.com/afiaka87/clip-guided-diffusion
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
artificial-intelligence deep-learning diffusion image-generation multimodal multimodality openai openai-clip text-to-image text-to-image-synthesis
Last synced: 05 Aug 2025
https://github.com/zengyan-97/X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
multimodality vision-and-language x-vlm
Last synced: 03 Oct 2025
https://github.com/MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering
Last synced: 17 Apr 2025
https://github.com/hazyresearch/fonduer
A knowledge base construction engine for richly formatted data
knowledge-base-construction machine-learning multimodality
Last synced: 13 Apr 2025
https://github.com/HazyResearch/fonduer
A knowledge base construction engine for richly formatted data
knowledge-base-construction machine-learning multimodality
Last synced: 08 May 2025
https://github.com/lium-lst/nmtpytorch
Sequence-to-Sequence Framework in PyTorch
asr cnn deep-learning multimodality neural-machine-translation nmt pytorch seq2seq speech-recognition
Last synced: 07 May 2025
https://github.com/microsoft/LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
clip fundation-models multimodality
Last synced: 10 Aug 2025
https://github.com/jshilong/gpt4roi
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
computer-vision gpt llm multimodality roi
Last synced: 06 Apr 2025
https://github.com/kyegomez/cm3leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
attention attention-is-all-you-need dalle imagegeneration multimodal multimodal-learning multimodality
Last synced: 06 Apr 2025
https://github.com/microsoft/univl
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
alignment caption caption-task coin joint localization msrvtt multimodal-sentiment-analysis multimodality pretrain pretraining retrieval-task segmentation video video-language video-text video-text-retrieval youcookii
Last synced: 05 Apr 2025
https://github.com/soujanyaporia/multimodal-sentiment-analysis
Attention-based multimodal fusion for sentiment analysis
attention attention-mechanism conversational-agents dialogue-systems lstm multimodality natural-language-processing sentiment-analysis sentiment-classification tensorflow
Last synced: 05 Apr 2025
https://github.com/kyegomez/navit
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
attention-mechanism clip gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality vit
Last synced: 16 May 2025
https://github.com/kyegomez/Med-PaLM
Towards Generalist Biomedical AI
biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource
Last synced: 16 Mar 2025
https://github.com/kyegomez/med-palm
Towards Generalist Biomedical AI
biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource
Last synced: 05 Apr 2025
https://github.com/Liang-ZX/VectorNet
Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”
gnn multimodality trajectory-prediction
Last synced: 20 Mar 2025
https://github.com/srvk/how2-dataset
This repository contains code and metadata of How2 dataset
corpus dataset how2-dataset language machine-translation multimodality speech-recognition video
Last synced: 27 Mar 2025
https://github.com/foundationvision/generateu
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
mllm multimodality object-detection open-vocabulary open-vocabulary-detection open-world
Last synced: 05 Apr 2025
https://github.com/biomedsciai/fuse-med-ml
A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
ai cmmd collaboration ct deep-learning fuse fuse-med-ml fusemedml hacktoberfest healthcare isic knight-challenge machine-learning medical medical-imaging multimodality python pytorch stoic vision
Last synced: 16 May 2025
https://github.com/kyegomez/pali3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
artificial-intelligence autogpt gpt4 machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality
Last synced: 13 Apr 2025
https://github.com/BiomedSciAI/fuse-med-ml
A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
ai cmmd collaboration ct deep-learning fuse fuse-med-ml fusemedml hacktoberfest healthcare isic knight-challenge machine-learning medical medical-imaging multimodality python pytorch stoic vision
Last synced: 08 May 2025
https://github.com/kyegomez/swarms-pytorch
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
artificial-intelligence firefly gpt4 hivemind machine-learning multimodal multimodal-deep-learning multimodality networks neural-network pso swarm-algorithm swarm-intelligence swarm-optimization swarm-robotics swarms swarms-of-agents
Last synced: 04 Apr 2025
https://github.com/senwu/emmental
A deep learning framework for building multimodal multi-task learning systems.
machine-learning multi-task-learning multimodality
Last synced: 10 Apr 2025
https://github.com/kyegomez/pali
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
artificial-intelligence gpt4 machine-learning multimodal multimodal-deep-learning multimodality
Last synced: 08 Sep 2025
https://github.com/lucidrains/mirasol-pytorch
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
artificial-intelligence attention-mechanism deep-learning multimodality transformers
Last synced: 30 Oct 2025
https://github.com/forestsking/chattime
PyTorch implementation of "ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data" (AAAI 2025 [oral])
foundation-models multimodal-time-series multimodality time-series
Last synced: 08 Oct 2025
https://github.com/firojalam/multimodal_social_media
multimodal social media content (text, image) classification
cnn-classification crisis-computing crisis-informatics disaster-response image-classification image-processing keras-tensorflow multimodal-deep-learning multimodality social-media text-classification tweet-classification
Last synced: 16 Feb 2026
https://github.com/amazon-science/gluonmm
A library of transformer models for computer vision and multi-modality research
computer-vision iccv-2021 multimodality pytorch transformer video
Last synced: 03 May 2025
https://github.com/firojalam/harmful-memes-detection-resources
Resources (conference/journal publications, references to dataset) for harmful memes detection.
disinformation harmful-memes memes multimodality
Last synced: 10 Oct 2025
https://github.com/kyegomez/EXA-1
An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!
artificial-intelligence dataset gpt4 jax kosmos large-dataset large-language-models multimodal multimodal-data multimodality pytorch pytorch-implementation triton
Last synced: 28 Mar 2025
https://github.com/ukplab/5pils
Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.
fact-checking fake-news large-language-models misinformation multimodal-large-language-models multimodality reverse-image-search
Last synced: 21 Jul 2025
https://github.com/kunzhan/mvgl
TCyb 2018: Graph learning for multiview clustering
clustering multimodal multimodality multiple-features multiview-clustering multiview-learning unsupervised-learning
Last synced: 28 Jun 2025
https://github.com/turinglang/mcmctempering.jl
Implementations of parallel tempering algorithms to augment samplers with tempering capabilities
julia-language mcmc mcmc-sampling multimodality probabilistic-programming turing-language
Last synced: 13 Apr 2025
https://github.com/xability/maidr-legacy
[DEPRECATED prototype] Multimodal Access and Interactive Data Representation
ai blind braille chart data description image impairments llm low-vision multimodality plot representation science sonification tactile visual visualization
Last synced: 12 Feb 2026
https://github.com/piomin/spring-ai-showcase
Sample Spring AI Application with several use cases
llm mistral-ai multimodal-large-language-models multimodality ollama openai pinecone rag retrieval-augmented-generation spring-ai spring-boot stock-api vector-store
Last synced: 15 Apr 2025
https://github.com/thechymera/behaviopy
Behavioral data analysis and plotting in Python.
animal-behavior biomedical data-science foss multimodality plotting
Last synced: 19 Apr 2025
https://github.com/servicenow/sygra
SyGra - Graph-oriented Synthetic data generation Pipeline
ai dpo image-datasets llm-datasets llm-framework llm-training-data low-code-no-code multimodality open-source python sft-data synthetic-data synthetic-dataset-generation
Last synced: 09 Oct 2025
https://github.com/kyegomez/mmca
The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"
artificial-intelligence attention attention-is-all-you-need attention-mechanism gpt4 multimodal multimodality neural-network neuralnetwork opensource-library opensourceforgood
Last synced: 07 May 2025
https://github.com/declare-lab/sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
multimodality naacl2024 video-question-answering video-understanding visual-language-models
Last synced: 14 Apr 2025
https://github.com/kyegomez/gen2
Implementation of "Text driven video generation" in pytorch
artificial-intelligence gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality stablediffusion texttovideo
Last synced: 02 Jul 2025
https://github.com/kyegomez/swarmalators
Pytorch Implementation of the Swarmalators algorithm from "Exotic swarming dynamics of high-dimensional swarmalators"
artificial-intelligence attention-is-all-you-need attention-mechanism machine-learning-algorithms multimodal multimodality swarm-cluster swarm-intelligence swarm-robotics swarms
Last synced: 07 May 2025
https://github.com/kyegomez/convnet
Implementation of the NFNets from the paper: "ConvNets Match Vision Transformers at Scale" by Google Research
ai convolutional-layers convolutional-neural-networks deeplearning machine-learning ml multimodal-learning multimodality
Last synced: 09 Oct 2025
https://github.com/kyegomez/celestial-1
Omni-Modality Processing, Understanding, and Generation
attention attention-is-all-you-need attention-mechanisms gpt-4 gpt4 multi-modal multimodal multimodal-deep-learning multimodality omnimodal openai
Last synced: 10 Mar 2026
https://github.com/soraxas/occ-traj120
A trajectories dataset with associated occupancy maps
dataset multimodality trajectory-clustering trajectory-planning trajectory-prediction
Last synced: 09 Feb 2026
https://github.com/mjunaidca/upwork-leads-gpt
Upwork Leads GPT is an AI-powered Job Finder tool for freelancers. It's built using OpenAI’s CustomGPT. It searches for the most relevant job postings based on provided keywords and capable to generate proposals.
api cloudflared conversational-ai conversational-ui cui custom-gpt fastapi job-search leads-generation multimodality openai-chatgpt upwork upwork-feed
Last synced: 01 Aug 2025
https://github.com/kyegomez/mmca-mgqa
Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention
artificial-intelligence attention attention-is-all-you-need attention-mechanism gpt4 multimodal multimodal-deep-learning multimodality
Last synced: 13 Oct 2025
https://github.com/warvito/integrating-multi-modal-neuroimaging
Integrating machining learning and multi-modal neuroimaging to detect schizophrenia at the level of the individual
multimodality neuroimage neuroimaging schizophrenia svm svm-classifier
Last synced: 23 Mar 2025
https://github.com/ishan-surana/federanet
Federated Multimodal Cyberattack classification model on social media messages (utilizing blockchain and quantum key cryptography). Blockchain server at https://cyberattack-blockchain.onrender.com/. Model interaction link below.
bb84-protocol blockchain convolutional-neural-networks cryptography federated-learning flask html-css-js information-security machine-learning multimodality natural-language-processing proof-of-authority streamlit zero-knowledge-proof
Last synced: 10 May 2026
https://github.com/stiebo/spring-ai-samples
Spring AI, chat client, vector store, RAG, multimodality samples
embeddings llm multimodality rag spring-ai spring-boot vectorstore
Last synced: 17 Apr 2026
https://github.com/nicolay-r/awesome-image-captioning-mllms
A curated list of awesome Image captioning strudies, aimed at annotating and reporting CT / MRI scans
image languagemodels multimodal-large-language-models multimodality nlp reports text
Last synced: 09 Feb 2026
https://github.com/elphinkuo/wheebrain
WheelBrain – Driving Intelligence to New Frontiers. Imagine a world where driving isn’t just autonomous but also exceptionally intelligent, safe and adaptive. That’s what WheelBrain brings to the table. Welcome to the future of smarter driving. This is a project for SambaNova Lightening Fast AI Hackathon.
agent autonomous-driving large-language-model llama3 llm multimodality planning reasoning
Last synced: 18 Feb 2026
https://github.com/elijahnzeli1/salesai
A unified multimodal generative AI system designed to learn and adapt across multiple modalities (text, audio, vision, robotics) with minimal data and long-term autonomy through reinforcement learning.
agi ai generative machine-learning ml ml-generative multimodal multimodal-ai multimodal-deep-learning multimodality unified unified-multimodal-models
Last synced: 12 Sep 2025
https://github.com/hase3b/bonbid-hie-segmentation
This repository implements a 3D U-Net model for segmenting HIE lesions in neonatal MRI scans, exploring various loss functions including Dice, Dice-Focal, Tversky, Hausdorff Distance, and hybrid loss functions. BONBID-HIE dataset is used for this study.
lesions loss-functions miccai2023 monai multimodality pytorch segmentation simpleitk torchio unet
Last synced: 22 May 2026