An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with multimodality

A curated list of projects in awesome lists tagged with multimodality .

https://github.com/lucidrains/big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

artificial-intelligence deep-learning generative-adversarial-networks multimodality text-to-image

Last synced: 14 May 2025

https://github.com/hymie122/RAG-Survey

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

aigc diffusion-models llm multimodality rag survey

Last synced: 20 Apr 2025

https://github.com/aidc-ai/ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot llama3 multimodal multimodal-large-language-models multimodality qwen vision-language-learning vision-language-model

Last synced: 17 Nov 2025

https://github.com/ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

activitynet clip didemo lsmdc msrvtt msvd multimodal multimodal-learning multimodality ranking retrieval retrieval-model search video-clip-retrieval video-text-retrieval

Last synced: 03 Apr 2025

https://github.com/fnzhan/Generative-AI

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

aigc diffusion-model gans multimodality nerfs

Last synced: 03 Apr 2025

https://github.com/vita-mllm/woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality

Last synced: 15 May 2025

https://github.com/bradyfu/woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality

Last synced: 31 Mar 2025

https://github.com/VITA-MLLM/Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality

Last synced: 09 May 2025

https://github.com/BAAI-Agents/Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai ai-agent ai-agents-framework computer-control cradle foundation-agent gcc general-computer-control generative-ai grounding large-language-models llm lmm multimodality personoid vision-language-model vlm

Last synced: 07 May 2025

https://github.com/microsoft/llm2clip

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip fundation-models multimodality

Last synced: 11 Apr 2025

https://github.com/afiaka87/clip-guided-diffusion

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

artificial-intelligence deep-learning diffusion image-generation multimodal multimodality openai openai-clip text-to-image text-to-image-synthesis

Last synced: 05 Aug 2025

https://github.com/zengyan-97/X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

multimodality vision-and-language x-vlm

Last synced: 03 Oct 2025

https://github.com/hazyresearch/fonduer

A knowledge base construction engine for richly formatted data

knowledge-base-construction machine-learning multimodality

Last synced: 13 Apr 2025

https://github.com/HazyResearch/fonduer

A knowledge base construction engine for richly formatted data

knowledge-base-construction machine-learning multimodality

Last synced: 08 May 2025

https://github.com/microsoft/LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip fundation-models multimodality

Last synced: 10 Aug 2025

https://github.com/jshilong/gpt4roi

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

computer-vision gpt llm multimodality roi

Last synced: 06 Apr 2025

https://github.com/kyegomez/cm3leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

attention attention-is-all-you-need dalle imagegeneration multimodal multimodal-learning multimodality

Last synced: 06 Apr 2025

https://github.com/microsoft/univl

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

alignment caption caption-task coin joint localization msrvtt multimodal-sentiment-analysis multimodality pretrain pretraining retrieval-task segmentation video video-language video-text video-text-retrieval youcookii

Last synced: 05 Apr 2025

https://github.com/kyegomez/navit

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

attention-mechanism clip gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality vit

Last synced: 16 May 2025

https://github.com/Liang-ZX/VectorNet

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

gnn multimodality trajectory-prediction

Last synced: 20 Mar 2025

https://github.com/srvk/how2-dataset

This repository contains code and metadata of How2 dataset

corpus dataset how2-dataset language machine-translation multimodality speech-recognition video

Last synced: 27 Mar 2025

https://github.com/foundationvision/generateu

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

mllm multimodality object-detection open-vocabulary open-vocabulary-detection open-world

Last synced: 05 Apr 2025

https://github.com/biomedsciai/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

ai cmmd collaboration ct deep-learning fuse fuse-med-ml fusemedml hacktoberfest healthcare isic knight-challenge machine-learning medical medical-imaging multimodality python pytorch stoic vision

Last synced: 16 May 2025

https://github.com/kyegomez/pali3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

artificial-intelligence autogpt gpt4 machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality

Last synced: 13 Apr 2025

https://github.com/BiomedSciAI/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

ai cmmd collaboration ct deep-learning fuse fuse-med-ml fusemedml hacktoberfest healthcare isic knight-challenge machine-learning medical medical-imaging multimodality python pytorch stoic vision

Last synced: 08 May 2025

https://github.com/senwu/emmental

A deep learning framework for building multimodal multi-task learning systems.

machine-learning multi-task-learning multimodality

Last synced: 10 Apr 2025

https://github.com/kyegomez/pali

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

artificial-intelligence gpt4 machine-learning multimodal multimodal-deep-learning multimodality

Last synced: 08 Sep 2025

https://github.com/lucidrains/mirasol-pytorch

Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch

artificial-intelligence attention-mechanism deep-learning multimodality transformers

Last synced: 30 Oct 2025

https://github.com/forestsking/chattime

PyTorch implementation of "ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data" (AAAI 2025 [oral])

foundation-models multimodal-time-series multimodality time-series

Last synced: 08 Oct 2025

https://github.com/amazon-science/gluonmm

A library of transformer models for computer vision and multi-modality research

computer-vision iccv-2021 multimodality pytorch transformer video

Last synced: 03 May 2025

https://github.com/firojalam/harmful-memes-detection-resources

Resources (conference/journal publications, references to dataset) for harmful memes detection.

disinformation harmful-memes memes multimodality

Last synced: 10 Oct 2025

https://github.com/kyegomez/EXA-1

An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!

artificial-intelligence dataset gpt4 jax kosmos large-dataset large-language-models multimodal multimodal-data multimodality pytorch pytorch-implementation triton

Last synced: 28 Mar 2025

https://github.com/ukplab/5pils

Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.

fact-checking fake-news large-language-models misinformation multimodal-large-language-models multimodality reverse-image-search

Last synced: 21 Jul 2025

https://github.com/turinglang/mcmctempering.jl

Implementations of parallel tempering algorithms to augment samplers with tempering capabilities

julia-language mcmc mcmc-sampling multimodality probabilistic-programming turing-language

Last synced: 13 Apr 2025

https://github.com/xability/maidr-legacy

[DEPRECATED prototype] Multimodal Access and Interactive Data Representation

ai blind braille chart data description image impairments llm low-vision multimodality plot representation science sonification tactile visual visualization

Last synced: 12 Feb 2026

https://github.com/thechymera/behaviopy

Behavioral data analysis and plotting in Python.

animal-behavior biomedical data-science foss multimodality plotting

Last synced: 19 Apr 2025

https://github.com/kyegomez/mmca

The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"

artificial-intelligence attention attention-is-all-you-need attention-mechanism gpt4 multimodal multimodality neural-network neuralnetwork opensource-library opensourceforgood

Last synced: 07 May 2025

https://github.com/declare-lab/sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodality naacl2024 video-question-answering video-understanding visual-language-models

Last synced: 14 Apr 2025

https://github.com/kyegomez/swarmalators

Pytorch Implementation of the Swarmalators algorithm from "Exotic swarming dynamics of high-dimensional swarmalators"

artificial-intelligence attention-is-all-you-need attention-mechanism machine-learning-algorithms multimodal multimodality swarm-cluster swarm-intelligence swarm-robotics swarms

Last synced: 07 May 2025

https://github.com/kyegomez/convnet

Implementation of the NFNets from the paper: "ConvNets Match Vision Transformers at Scale" by Google Research

ai convolutional-layers convolutional-neural-networks deeplearning machine-learning ml multimodal-learning multimodality

Last synced: 09 Oct 2025

https://github.com/soraxas/occ-traj120

A trajectories dataset with associated occupancy maps

dataset multimodality trajectory-clustering trajectory-planning trajectory-prediction

Last synced: 09 Feb 2026

https://github.com/mjunaidca/upwork-leads-gpt

Upwork Leads GPT is an AI-powered Job Finder tool for freelancers. It's built using OpenAI’s CustomGPT. It searches for the most relevant job postings based on provided keywords and capable to generate proposals.

api cloudflared conversational-ai conversational-ui cui custom-gpt fastapi job-search leads-generation multimodality openai-chatgpt upwork upwork-feed

Last synced: 01 Aug 2025

https://github.com/kyegomez/mmca-mgqa

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

artificial-intelligence attention attention-is-all-you-need attention-mechanism gpt4 multimodal multimodal-deep-learning multimodality

Last synced: 13 Oct 2025

https://github.com/warvito/integrating-multi-modal-neuroimaging

Integrating machining learning and multi-modal neuroimaging to detect schizophrenia at the level of the individual

multimodality neuroimage neuroimaging schizophrenia svm svm-classifier

Last synced: 23 Mar 2025

https://github.com/ishan-surana/federanet

Federated Multimodal Cyberattack classification model on social media messages (utilizing blockchain and quantum key cryptography). Blockchain server at https://cyberattack-blockchain.onrender.com/. Model interaction link below.

bb84-protocol blockchain convolutional-neural-networks cryptography federated-learning flask html-css-js information-security machine-learning multimodality natural-language-processing proof-of-authority streamlit zero-knowledge-proof

Last synced: 10 May 2026

https://github.com/stiebo/spring-ai-samples

Spring AI, chat client, vector store, RAG, multimodality samples

embeddings llm multimodality rag spring-ai spring-boot vectorstore

Last synced: 17 Apr 2026

https://github.com/nicolay-r/awesome-image-captioning-mllms

A curated list of awesome Image captioning strudies, aimed at annotating and reporting CT / MRI scans

image languagemodels multimodal-large-language-models multimodality nlp reports text

Last synced: 09 Feb 2026

https://github.com/elphinkuo/wheebrain

WheelBrain – Driving Intelligence to New Frontiers. Imagine a world where driving isn’t just autonomous but also exceptionally intelligent, safe and adaptive. That’s what WheelBrain brings to the table. Welcome to the future of smarter driving. This is a project for SambaNova Lightening Fast AI Hackathon.

agent autonomous-driving large-language-model llama3 llm multimodality planning reasoning

Last synced: 18 Feb 2026

https://github.com/elijahnzeli1/salesai

A unified multimodal generative AI system designed to learn and adapt across multiple modalities (text, audio, vision, robotics) with minimal data and long-term autonomy through reinforcement learning.

agi ai generative machine-learning ml ml-generative multimodal multimodal-ai multimodal-deep-learning multimodality unified unified-multimodal-models

Last synced: 12 Sep 2025

https://github.com/hase3b/bonbid-hie-segmentation

This repository implements a 3D U-Net model for segmenting HIE lesions in neonatal MRI scans, exploring various loss functions including Dice, Dice-Focal, Tversky, Hausdorff Distance, and hybrid loss functions. BONBID-HIE dataset is used for this study.

lesions loss-functions miccai2023 monai multimodality pytorch segmentation simpleitk torchio unet

Last synced: 22 May 2026