An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with multi-modality

A curated list of projects in awesome lists tagged with multi-modality .

https://github.com/haotian-liu/llava

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning

Last synced: 17 Nov 2025

https://github.com/haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning

Last synced: 14 Mar 2025

https://github.com/lucidrains/deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

artificial-intelligence deep-learning implicit-neural-representation multi-modality siren text-to-image transformers

Last synced: 10 Apr 2025

https://github.com/evolvinglmms-lab/otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

artificial-inteligence chatgpt deep-learning embodied-ai foundation-models gpt-4 instruction-tuning large-scale-models machine-learning multi-modality visual-language-learning

Last synced: 13 Dec 2025

https://github.com/opengvlab/multi-modality-arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa

Last synced: 20 Apr 2025

https://github.com/OpenGVLab/Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa

Last synced: 03 Apr 2025

https://github.com/kyegomez/Sophia

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

artificial-intelligence chatgpt deep-learning multi-modality neural-network optimizer

Last synced: 22 Jul 2025

https://github.com/kyegomez/sophia

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

artificial-intelligence chatgpt deep-learning multi-modality neural-network optimizer

Last synced: 23 Oct 2025

https://github.com/kyegomez/Gemini

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

ai artificial-intelligence gemini gpt4 machine-learning ml multi-modality multimodla

Last synced: 05 Apr 2025

https://github.com/kyegomez/gemini

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

ai artificial-intelligence gemini gpt4 machine-learning ml multi-modality multimodla

Last synced: 04 Apr 2025

https://github.com/dvlab-research/visionzip

Official repository for VisionZip (CVPR 2025)

efficiency multi-modality vision-language-model vlms

Last synced: 03 Jul 2025

https://github.com/JIA-Lab-research/MGM-Omni

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

audio-language-model multi-modal-large-language-model multi-modality multimodal text-to-speech

Last synced: 18 Jan 2026

https://github.com/zwwwayne/mmmot

[ICCV2019] Robust Multi-Modality Multi-Object Tracking

iccv2019 mot multi-modality

Last synced: 08 May 2025

https://github.com/ZwwWayne/mmMOT

[ICCV2019] Robust Multi-Modality Multi-Object Tracking

iccv2019 mot multi-modality

Last synced: 20 Mar 2025

https://github.com/dvlab-research/uvtr

Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

3d-detection multi-modality pytorch

Last synced: 03 Jul 2025

https://github.com/RLHF-V/RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

chatbot gpt-4 llama multi-modality multimodal rlhf-v visual-language-learning

Last synced: 24 Feb 2025

https://github.com/dvlab-research/UVTR

Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

3d-detection multi-modality pytorch

Last synced: 20 Mar 2025

https://github.com/dvlab-research/VisionZip

Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"

efficiency multi-modality vision-language-model vlms

Last synced: 23 Sep 2025

https://github.com/sshh12/multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

large-context large-language-models large-multimodal-models llava llm multi-modality multimodal vision-language-model

Last synced: 07 May 2025

https://github.com/jina-ai/rungpt

An open-source cloud-native of large multi-modal models (LMMs) serving framework.

flamingo gpt-4 large-language-models large-multimadality-models llama llm-hosting llm-serve lmm-serve multi-modality opengpt self-hosting transformers

Last synced: 02 Apr 2026

https://github.com/kyegomez/mambabyte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

ai artificial-intelligence gpt4v machine-learning mamba megabyte ml multi-modality tokenizer

Last synced: 04 Apr 2025

https://github.com/dvlab-research/prompt-highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

llm-inference multi-modality text-generation

Last synced: 03 Jul 2025

https://github.com/kyegomez/moe-mamba

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

ai ml moe multi-modal-fusion multi-modality swarms

Last synced: 07 May 2025

https://github.com/kyegomez/MambaByte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

ai artificial-intelligence gpt4v machine-learning mamba megabyte ml multi-modality tokenizer

Last synced: 20 Mar 2025

https://github.com/skit-ai/speechllm

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

conversational-ai llm multi-modal-llms multi-modality speech

Last synced: 20 Feb 2026

https://github.com/kyegomez/kosmos2.5

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

attention attention-is-all-you-need gpt3 gpt4 kosmos multi-modality multimodal multimodal-deep-learning opensource

Last synced: 06 Apr 2025

https://github.com/rentainhe/trar-vqa

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2

Last synced: 28 Aug 2025

https://github.com/kyegomez/qformer

Implementation of Qformer from BLIP2 in Zeta Lego blocks.

ai artificial-intelligence attention-mechanism blip2 machine machine-learning multi-modal multi-modality

Last synced: 06 Mar 2026

https://github.com/kyegomez/fuyu

Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch

ai artificial-intelligence gpt4 gpt5 machine-learning multi-modal multi-modality

Last synced: 07 May 2025

https://github.com/kyegomez/mm1

PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"

ai artificial-intelligence deep-learning gpt4 machine-learning ml mm1 multi-modal multi-modal-revolution multi-modality

Last synced: 15 Apr 2025

https://github.com/kyegomez/mc-vit

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

ai multi-modal multi-modal-transformers multi-modality open-source transformer transformers vit

Last synced: 17 Aug 2025

https://github.com/kyegomez/mlxtransformer

Simple Implementation of a Transformer in the new framework MLX by Apple

artificial-intelligence gpt4 machine-learning multi-modal multi-modality

Last synced: 04 Oct 2025

https://github.com/kyegomez/swarmos

An all-new OS that orchestrates autonomous agents as workers to execute tasks.

ai asynchronous asynchronous-programming concurrent gpt4 llms ml multi-modality multithreading operating-system os swarms

Last synced: 07 May 2025

https://github.com/kyegomez/tinygptv

Simple Implementation of TinyGPTV in super simple Zeta lego blocks

artificial-intelligence attention attention-is-all-you-need deep-learning multi-modal multi-modality transformers

Last synced: 07 May 2025

https://github.com/kyegomez/hrtx

Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2

ai artificial-intelligence ensemble gpt4v machine-learning ml multi-modal multi-modality rt-2 rtx

Last synced: 16 Apr 2025

https://github.com/kyegomez/multimodal-tot

Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement

artificial-intelligence gpt4 multi-modal multi-modality multi-modality-data

Last synced: 07 May 2025

https://github.com/kyegomez/hsss

Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling"

ai artificial-intelligence jesus machine-learning ml multi-modal multi-modality open-source pytorch rnn rnns ssms tensorflow zeta

Last synced: 07 May 2025

https://github.com/xufangzhi/moca

[Pattern Recognition] The implementation of MoCA

multi-modality textbook-question-answering

Last synced: 20 Aug 2025

https://github.com/kyegomez/visiondatasets

Open source scripts to create large scale datasets with rich detail for multi-modal models

ai artificial-intelligence function-calling gpt3 gpt4 json machine-learning ml multi-modal multi-modality pytorch tensorflow

Last synced: 07 May 2025

https://github.com/voidful/mmlm

Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra

gpt-4o multi-modality multi-model world-models

Last synced: 04 Sep 2025

https://github.com/kyegomez/gats

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta

ai attention attention-is-all-you-need attention-mechanism gpt4 llama ml multi-modal multi-modality multimodal open-source

Last synced: 25 Oct 2025

https://github.com/kyegomez/vortexfusion

Transformers + Mambas + LSTMS All in One Model

agora ai ai-research deep-learning lstms mambas ml multi-modality ssms transformers

Last synced: 12 Oct 2025

https://github.com/anondo1969/fedsepsis

Repository for the journal article, 'FedSepsis: A Federated Multi-Modal Deep Learning-Based Internet of Medical Things Application for Early Detection of Sepsis from Electronic Health Records Using Raspberry Pi and Jetson Nano Devices', Mahbub Ul Alam, Rahim Rahmani. Sensors 23, no. 2: 970, https://doi.org/10.3390/s23020970.

clinical-decision-support-system clinicalbert deep-learning early-sepsis-detection electronic-health-records federated-learning gan health-informatics internet-of-medical-things iomt jetson-nano machine-learning multi-modality natural-language-processing nlp raspberry-pi sepsis smart-healthcare

Last synced: 08 May 2025

https://github.com/kyegomez/aoa-torch

Implementation of Attention on Attention in Zeta

ai artificial-intelligence gpt4 machine-learning multi-modal multi-modality research

Last synced: 07 May 2025

https://github.com/kyegomez/ai-reading-list

This collection brings together the highest-signal research papers in modern AI from the invention of the Transformer to the frontier work of 2024–2025 into a single, curated map of the field

agents ai ai-workflows attention attention-models diffusion diffusion-models flow-matching flux ml multi-modality research-paper transformers

Last synced: 18 Jan 2026

https://github.com/ravi-teja-konda/tunedllavadelights

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

chatgpt dalle2 dessert finetuning gpt4 gpt4v llama2 llava multi-modality multimodal nutrition nutrition-information stable-diffusion tranformers vision-language-learning vision-language-model

Last synced: 21 Apr 2026

https://github.com/anondo1969/fedsemicoviddetector

Repository for the journal article, 'Federated Semi-Supervised Multi-Task Learning to Detect COVID-19 and Lungs Segmentation Marking Using Chest Radiography Images and Raspberry Pi Devices: An Internet of Medical Things Application', Mahbub Ul Alam, Rahim Rahmani. Sensors 21, no. 15: 5025, https://doi.org/10.3390/s21155025.

clinical-decision-support-system covid-19-detection deep-learning electronic-health-records federated-learning health-informatics internet-of-medical-things iomt lungs-segmentation-detection machine-learning multi-modality raspberry-pi smart-healthcare

Last synced: 08 May 2025

https://github.com/anondo1969/thremaltimodal-covidetector

Repository for the conference paper 'COVID-19 detection from thermal image and tabular medical data utilizing multi-modal machine learning', Mahbub Ul Alam, Jaakko Hollmén and Rahim Rahmani. IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), 2023, pp. 646-653.

covid-19-detection deep-learning internet-of-medical-things machine-learning multi-modality tabular-medical-data thermal-image

Last synced: 08 May 2025

https://github.com/jonathanjsjsc/swarm

🦟 Interactive swarm simulation where pointer swarms follow your cursor - WebGL / threejs

attention-mechanism docker java machine-learning multi-modality mybatis prompt-toolkit pso rabbitmq spring-cloud stable-diffusion swarm swarms tsp

Last synced: 02 Aug 2025

https://github.com/yuanze-lin/olympus

The official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

chatbot chatgpt deeplearning foundation-models instruction-tuning llava llms mllms multi-modality multimodal pytorch vision-language-model

Last synced: 01 Apr 2025