Projects in Awesome Lists tagged with multimodality

https://github.com/lucidrains/big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

artificial-intelligence deep-learning generative-adversarial-networks multimodality text-to-image

Last synced: 14 May 2025

https://github.com/hymie122/RAG-Survey

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

aigc diffusion-models llm multimodality rag survey

Last synced: 20 Apr 2025

https://github.com/aidc-ai/ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot llama3 multimodal multimodal-large-language-models multimodality qwen vision-language-learning vision-language-model

Last synced: 17 Nov 2025

https://github.com/PreferredAI/cornac

A Comparative Framework for Multimodal Recommender Systems

collaborative-filtering matrix-factorization multimodal-learning multimodality recommendation-algorithms recommendation-engine recommendation-system recommender-system

Last synced: 18 Jul 2025

https://github.com/ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

activitynet clip didemo lsmdc msrvtt msvd multimodal multimodal-learning multimodality ranking retrieval retrieval-model search video-clip-retrieval video-text-retrieval

Last synced: 03 Apr 2025

https://github.com/fnzhan/Generative-AI

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

aigc diffusion-model gans multimodality nerfs

Last synced: 03 Apr 2025

https://github.com/aimclub/FEDOT

Automated modeling and machine learning framework FEDOT

automated-machine-learning automation automl evolutionary-algorithms fedot genetic-programming hyperparameter-optimization machine-learning multimodality parameter-tuning structural-learning

Last synced: 18 Apr 2025

https://github.com/vita-mllm/woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality

Last synced: 15 May 2025

https://github.com/bradyfu/woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality

Last synced: 31 Mar 2025

https://github.com/VITA-MLLM/Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality

Last synced: 09 May 2025

https://github.com/BAAI-Agents/Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai ai-agent ai-agents-framework computer-control cradle foundation-agent gcc general-computer-control generative-ai grounding large-language-models llm lmm multimodality personoid vision-language-model vlm

Last synced: 07 May 2025

https://github.com/microsoft/llm2clip

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip fundation-models multimodality

Last synced: 11 Apr 2025

https://github.com/afiaka87/clip-guided-diffusion

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

artificial-intelligence deep-learning diffusion image-generation multimodal multimodality openai openai-clip text-to-image text-to-image-synthesis

Last synced: 05 Aug 2025

https://github.com/zengyan-97/X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

multimodality vision-and-language x-vlm

Last synced: 03 Oct 2025

https://github.com/MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering

Last synced: 17 Apr 2025

https://github.com/hazyresearch/fonduer

A knowledge base construction engine for richly formatted data

knowledge-base-construction machine-learning multimodality

Last synced: 13 Apr 2025

https://github.com/HazyResearch/fonduer

A knowledge base construction engine for richly formatted data

knowledge-base-construction machine-learning multimodality

Last synced: 08 May 2025

https://github.com/lium-lst/nmtpytorch

Sequence-to-Sequence Framework in PyTorch

asr cnn deep-learning multimodality neural-machine-translation nmt pytorch seq2seq speech-recognition

Last synced: 07 May 2025

https://github.com/microsoft/LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip fundation-models multimodality

Last synced: 10 Aug 2025

https://github.com/jshilong/gpt4roi

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

computer-vision gpt llm multimodality roi

Last synced: 06 Apr 2025

https://github.com/kyegomez/cm3leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

attention attention-is-all-you-need dalle imagegeneration multimodal multimodal-learning multimodality

Last synced: 06 Apr 2025

https://github.com/microsoft/univl

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

alignment caption caption-task coin joint localization msrvtt multimodal-sentiment-analysis multimodality pretrain pretraining retrieval-task segmentation video video-language video-text video-text-retrieval youcookii

Last synced: 05 Apr 2025

https://github.com/soujanyaporia/multimodal-sentiment-analysis

Attention-based multimodal fusion for sentiment analysis

attention attention-mechanism conversational-agents dialogue-systems lstm multimodality natural-language-processing sentiment-analysis sentiment-classification tensorflow

Last synced: 05 Apr 2025

https://github.com/kyegomez/navit

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

attention-mechanism clip gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality vit

Last synced: 16 May 2025

https://github.com/kyegomez/Med-PaLM

Towards Generalist Biomedical AI

biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource

Last synced: 16 Mar 2025

https://github.com/kyegomez/med-palm

Towards Generalist Biomedical AI

biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource

Last synced: 05 Apr 2025

https://github.com/Liang-ZX/VectorNet

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

gnn multimodality trajectory-prediction

Last synced: 20 Mar 2025

https://github.com/srvk/how2-dataset

This repository contains code and metadata of How2 dataset

corpus dataset how2-dataset language machine-translation multimodality speech-recognition video

Last synced: 27 Mar 2025

https://github.com/foundationvision/generateu

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

mllm multimodality object-detection open-vocabulary open-vocabulary-detection open-world

Last synced: 05 Apr 2025

https://github.com/biomedsciai/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

ai cmmd collaboration ct deep-learning fuse fuse-med-ml fusemedml hacktoberfest healthcare isic knight-challenge machine-learning medical medical-imaging multimodality python pytorch stoic vision

Last synced: 16 May 2025

https://github.com/kyegomez/pali3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

artificial-intelligence autogpt gpt4 machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality

Last synced: 13 Apr 2025

https://github.com/BiomedSciAI/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

ai cmmd collaboration ct deep-learning fuse fuse-med-ml fusemedml hacktoberfest healthcare isic knight-challenge machine-learning medical medical-imaging multimodality python pytorch stoic vision

Last synced: 08 May 2025

https://github.com/kyegomez/swarms-pytorch

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

artificial-intelligence firefly gpt4 hivemind machine-learning multimodal multimodal-deep-learning multimodality networks neural-network pso swarm-algorithm swarm-intelligence swarm-optimization swarm-robotics swarms swarms-of-agents

Last synced: 04 Apr 2025

https://github.com/senwu/emmental

A deep learning framework for building multimodal multi-task learning systems.

machine-learning multi-task-learning multimodality

Last synced: 10 Apr 2025

https://github.com/kyegomez/pali

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

artificial-intelligence gpt4 machine-learning multimodal multimodal-deep-learning multimodality

Last synced: 08 Sep 2025

https://github.com/lucidrains/mirasol-pytorch

Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch

artificial-intelligence attention-mechanism deep-learning multimodality transformers

Last synced: 30 Oct 2025

https://github.com/forestsking/chattime

PyTorch implementation of "ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data" (AAAI 2025 [oral])

foundation-models multimodal-time-series multimodality time-series

Last synced: 08 Oct 2025

https://github.com/firojalam/multimodal_social_media

multimodal social media content (text, image) classification

cnn-classification crisis-computing crisis-informatics disaster-response image-classification image-processing keras-tensorflow multimodal-deep-learning multimodality social-media text-classification tweet-classification

Last synced: 16 Feb 2026

https://github.com/amazon-science/gluonmm

A library of transformer models for computer vision and multi-modality research

computer-vision iccv-2021 multimodality pytorch transformer video

Last synced: 03 May 2025

https://github.com/firojalam/harmful-memes-detection-resources

Resources (conference/journal publications, references to dataset) for harmful memes detection.

disinformation harmful-memes memes multimodality

Last synced: 10 Oct 2025

https://github.com/kyegomez/EXA-1

An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!

artificial-intelligence dataset gpt4 jax kosmos large-dataset large-language-models multimodal multimodal-data multimodality pytorch pytorch-implementation triton

Last synced: 28 Mar 2025

https://github.com/ukplab/5pils

Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.

fact-checking fake-news large-language-models misinformation multimodal-large-language-models multimodality reverse-image-search

Last synced: 21 Jul 2025

https://github.com/kunzhan/mvgl

TCyb 2018: Graph learning for multiview clustering

clustering multimodal multimodality multiple-features multiview-clustering multiview-learning unsupervised-learning

Last synced: 28 Jun 2025

https://github.com/turinglang/mcmctempering.jl

Implementations of parallel tempering algorithms to augment samplers with tempering capabilities

julia-language mcmc mcmc-sampling multimodality probabilistic-programming turing-language

Last synced: 13 Apr 2025

https://github.com/xability/maidr-legacy

[DEPRECATED prototype] Multimodal Access and Interactive Data Representation

ai blind braille chart data description image impairments llm low-vision multimodality plot representation science sonification tactile visual visualization

Last synced: 12 Feb 2026

https://github.com/piomin/spring-ai-showcase

Sample Spring AI Application with several use cases

llm mistral-ai multimodal-large-language-models multimodality ollama openai pinecone rag retrieval-augmented-generation spring-ai spring-boot stock-api vector-store

Last synced: 15 Apr 2025

https://github.com/thechymera/behaviopy

Behavioral data analysis and plotting in Python.

animal-behavior biomedical data-science foss multimodality plotting

Last synced: 19 Apr 2025

https://github.com/servicenow/sygra

SyGra - Graph-oriented Synthetic data generation Pipeline

ai dpo image-datasets llm-datasets llm-framework llm-training-data low-code-no-code multimodality open-source python sft-data synthetic-data synthetic-dataset-generation

Last synced: 09 Oct 2025

https://github.com/kyegomez/mmca

The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"

artificial-intelligence attention attention-is-all-you-need attention-mechanism gpt4 multimodal multimodality neural-network neuralnetwork opensource-library opensourceforgood

Last synced: 07 May 2025

https://github.com/declare-lab/sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodality naacl2024 video-question-answering video-understanding visual-language-models

Last synced: 14 Apr 2025

https://github.com/kyegomez/gen2

Implementation of "Text driven video generation" in pytorch

artificial-intelligence gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality stablediffusion texttovideo

Last synced: 02 Jul 2025

https://github.com/kyegomez/swarmalators

Pytorch Implementation of the Swarmalators algorithm from "Exotic swarming dynamics of high-dimensional swarmalators"

artificial-intelligence attention-is-all-you-need attention-mechanism machine-learning-algorithms multimodal multimodality swarm-cluster swarm-intelligence swarm-robotics swarms

Last synced: 07 May 2025

https://github.com/kyegomez/convnet

Implementation of the NFNets from the paper: "ConvNets Match Vision Transformers at Scale" by Google Research

ai convolutional-layers convolutional-neural-networks deeplearning machine-learning ml multimodal-learning multimodality

Last synced: 09 Oct 2025

https://github.com/kyegomez/celestial-1

Omni-Modality Processing, Understanding, and Generation

attention attention-is-all-you-need attention-mechanisms gpt-4 gpt4 multi-modal multimodal multimodal-deep-learning multimodality omnimodal openai

Last synced: 10 Mar 2026

https://github.com/soraxas/occ-traj120

A trajectories dataset with associated occupancy maps

dataset multimodality trajectory-clustering trajectory-planning trajectory-prediction

Last synced: 09 Feb 2026

https://github.com/mjunaidca/upwork-leads-gpt

Upwork Leads GPT is an AI-powered Job Finder tool for freelancers. It's built using OpenAI’s CustomGPT. It searches for the most relevant job postings based on provided keywords and capable to generate proposals.

api cloudflared conversational-ai conversational-ui cui custom-gpt fastapi job-search leads-generation multimodality openai-chatgpt upwork upwork-feed

Last synced: 01 Aug 2025

https://github.com/kyegomez/mmca-mgqa

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

artificial-intelligence attention attention-is-all-you-need attention-mechanism gpt4 multimodal multimodal-deep-learning multimodality

Last synced: 13 Oct 2025

https://github.com/warvito/integrating-multi-modal-neuroimaging

Integrating machining learning and multi-modal neuroimaging to detect schizophrenia at the level of the individual

multimodality neuroimage neuroimaging schizophrenia svm svm-classifier

Last synced: 23 Mar 2025

https://github.com/ishan-surana/federanet

Federated Multimodal Cyberattack classification model on social media messages (utilizing blockchain and quantum key cryptography). Blockchain server at https://cyberattack-blockchain.onrender.com/. Model interaction link below.

bb84-protocol blockchain convolutional-neural-networks cryptography federated-learning flask html-css-js information-security machine-learning multimodality natural-language-processing proof-of-authority streamlit zero-knowledge-proof

Last synced: 10 May 2026

https://github.com/stiebo/spring-ai-samples

Spring AI, chat client, vector store, RAG, multimodality samples

embeddings llm multimodality rag spring-ai spring-boot vectorstore

Last synced: 17 Apr 2026

https://github.com/nicolay-r/awesome-image-captioning-mllms

A curated list of awesome Image captioning strudies, aimed at annotating and reporting CT / MRI scans

image languagemodels multimodal-large-language-models multimodality nlp reports text

Last synced: 09 Feb 2026

https://github.com/elphinkuo/wheebrain

WheelBrain – Driving Intelligence to New Frontiers. Imagine a world where driving isn’t just autonomous but also exceptionally intelligent, safe and adaptive. That’s what WheelBrain brings to the table. Welcome to the future of smarter driving. This is a project for SambaNova Lightening Fast AI Hackathon.

agent autonomous-driving large-language-model llama3 llm multimodality planning reasoning

Last synced: 18 Feb 2026

https://github.com/nagababumo/building-applications-with-vector-databases

Last synced: 11 May 2026

https://github.com/elijahnzeli1/salesai

A unified multimodal generative AI system designed to learn and adapt across multiple modalities (text, audio, vision, robotics) with minimal data and long-term autonomy through reinforcement learning.

agi ai generative machine-learning ml ml-generative multimodal multimodal-ai multimodal-deep-learning multimodality unified unified-multimodal-models

Last synced: 12 Sep 2025

https://github.com/hase3b/bonbid-hie-segmentation

This repository implements a 3D U-Net model for segmenting HIE lesions in neonatal MRI scans, exploring various loss functions including Dice, Dice-Focal, Tversky, Hausdorff Distance, and hybrid loss functions. BONBID-HIE dataset is used for this study.

lesions loss-functions miccai2023 monai multimodality pytorch segmentation simpleitk torchio unet

Last synced: 22 May 2026