Projects in Awesome Lists tagged with multi-modality

https://github.com/haotian-liu/llava

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning

Last synced: 16 Dec 2024

https://github.com/haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning

Last synced: 25 Oct 2024

https://github.com/jina-ai/clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

bert bert-as-service clip-as-service clip-model cross-modal-retrieval cross-modality deep-learning image2vec multi-modality neural-search onnx openai pytorch sentence-encoding sentence2vec

Last synced: 16 Dec 2024

https://github.com/lucidrains/deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

artificial-intelligence deep-learning implicit-neural-representation multi-modality siren text-to-image transformers

Last synced: 18 Dec 2024

https://github.com/luodian/otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

artificial-inteligence chatgpt deep-learning embodied-ai foundation-models gpt-4 instruction-tuning large-scale-models machine-learning multi-modality visual-language-learning

Last synced: 19 Dec 2024

https://github.com/Luodian/Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

artificial-inteligence chatgpt deep-learning embodied-ai foundation-models gpt-4 instruction-tuning large-scale-models machine-learning multi-modality visual-language-learning

Last synced: 24 Oct 2024

https://github.com/internlm/internlm-xcomposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

Last synced: 19 Dec 2024

https://github.com/InternLM/InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

Last synced: 14 Nov 2024

https://github.com/kyegomez/swarms

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503

agents ai artificial-intelligence attention-mechanism chatgpt gpt4 gpt4all huggingface langchain langchain-python machine-learning multi-modal-imaging multi-modality multimodal prompt-engineering prompt-toolkit prompting swarms transformer-models tree-of-thoughts

Last synced: 17 Dec 2024

https://github.com/DLR-RM/3DObjectTracking

Algorithms and Publications on 3D Object Tracking

accv2020 articulated computer-vision cvpr2022 ijcv iros2023 multi-body multi-modality object-tracking paper pose-estimation real-time rgbd tpami tracking

Last synced: 27 Oct 2024

https://github.com/opengvlab/multi-modality-arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa

Last synced: 09 Nov 2024

https://github.com/OpenGVLab/Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa

Last synced: 04 Nov 2024

https://github.com/ziqihuangg/Collaborative-Diffusion

Collaborative Diffusion (CVPR 2023)

aigc diffusion-models face-editing face-generation gen-ai image-editing image-generation latent-diffusion-models multi-modality stable-diffusion

Last synced: 31 Oct 2024

https://github.com/kyegomez/Sophia

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

artificial-intelligence chatgpt deep-learning multi-modality neural-network optimizer

Last synced: 29 Nov 2024

https://github.com/kyegomez/sophia

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

artificial-intelligence chatgpt deep-learning multi-modality neural-network optimizer

Last synced: 21 Dec 2024

https://github.com/kyegomez/gemini

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

ai artificial-intelligence gemini gpt4 machine-learning ml multi-modality multimodla

Last synced: 18 Dec 2024

https://github.com/kyegomez/Gemini

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

ai artificial-intelligence gemini gpt4 machine-learning ml multi-modality multimodla

Last synced: 05 Nov 2024

https://github.com/zwwwayne/mmmot

[ICCV2019] Robust Multi-Modality Multi-Object Tracking

iccv2019 mot multi-modality

Last synced: 18 Nov 2024

https://github.com/ZwwWayne/mmMOT

[ICCV2019] Robust Multi-Modality Multi-Object Tracking

iccv2019 mot multi-modality

Last synced: 28 Oct 2024

https://github.com/dvlab-research/UVTR

Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

3d-detection multi-modality pytorch

Last synced: 28 Oct 2024

https://github.com/sshh12/multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

large-context large-language-models large-multimodal-models llava llm multi-modality multimodal vision-language-model

Last synced: 17 Nov 2024

https://github.com/jina-ai/rungpt

An open-source cloud-native of large multi-modal models (LMMs) serving framework.

flamingo gpt-4 large-language-models large-multimadality-models llama llm-hosting llm-serve lmm-serve multi-modality opengpt self-hosting transformers

Last synced: 17 Dec 2024

https://github.com/kyegomez/the-compiler

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

agora artficial-intelligence autogpt chain-of-thought chatgpt deep-learning deep-learning-algorithms multi-modal-fusion multi-modality multimodal-deep-learning prompt-engineering reinforcement-learning tree-of-thoughts

Last synced: 18 Dec 2024

https://github.com/kyegomez/andromeda

An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

agi artificial-general-intelligence artificial-intelligence artificial-intelligence-algorithms deep-learning gpt-4 language-model large-language-models multi-modality multimodal neural-networks transformer

Last synced: 16 Dec 2024

https://github.com/kyegomez/mambabyte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

ai artificial-intelligence gpt4v machine-learning mamba megabyte ml multi-modality tokenizer

Last synced: 21 Dec 2024

https://github.com/kyegomez/MambaByte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

ai artificial-intelligence gpt4v machine-learning mamba megabyte ml multi-modality tokenizer

Last synced: 28 Oct 2024

https://github.com/kyegomez/kosmos2.5

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

attention attention-is-all-you-need gpt3 gpt4 kosmos multi-modality multimodal multimodal-deep-learning opensource

Last synced: 16 Dec 2024

https://github.com/rentainhe/trar-vqa

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2

Last synced: 07 Nov 2024

https://github.com/kyegomez/moe-mamba

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

ai ml moe multi-modal-fusion multi-modality swarms

Last synced: 09 Nov 2024

https://github.com/amazon-science/crossmodal-contrastive-learning

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

computer-vision contrastive-learning multi-modality natural-language-processing transformers video video-captioning video-text-retrieval

Last synced: 12 Nov 2024

https://chenshuang-zhang.github.io/imagenet_d/

benchmark computer-vision dataset diffusion-models generative-models image-recognition imagenet large-language-model multi-modality out-of-distribution recognition robustness stable-diffusion synthetic-data text-to-image-synthesis vision-language-model

Last synced: 02 Nov 2024

https://github.com/kyegomez/qformer

Implementation of Qformer from BLIP2 in Zeta Lego blocks.

ai artificial-intelligence attention-mechanism blip2 machine machine-learning multi-modal multi-modality

Last synced: 19 Dec 2024

https://github.com/trendscenter/fit

Fusion ICA Toolbox (MATLAB)

analysis cca eeg fmri gene ica iva joint-ica matlab mcca multi-modality parallel-ica pca

Last synced: 24 Nov 2024

https://github.com/kyegomez/mm1

PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"

ai artificial-intelligence deep-learning gpt4 machine-learning ml mm1 multi-modal multi-modal-revolution multi-modality

Last synced: 16 Nov 2024

https://github.com/kyegomez/fuyu

Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch

ai artificial-intelligence gpt4 gpt5 machine-learning multi-modal multi-modality

Last synced: 09 Nov 2024

https://github.com/kyegomez/forest-of-thoughts

A forest of autonomous agents.

ai artificial-intelligence machine-learning ml multi-modal multi-modality

Last synced: 09 Nov 2024

https://github.com/kyegomez/swarmos

An all-new OS that orchestrates autonomous agents as workers to execute tasks.

ai asynchronous asynchronous-programming concurrent gpt4 llms ml multi-modality multithreading operating-system os swarms

Last synced: 09 Nov 2024

https://github.com/kyegomez/athena-for-search

The World's First AI-Enabled Multi-Modality Native Search Engine

agora apacai artificial-intelligence bing chatgpt chatgpt-api data data-engineering google human-computer-interaction multi-modal-imaging multi-modality multi-modality-data search-algorithm search-engine user-interface

Last synced: 09 Nov 2024

https://github.com/kyegomez/mc-vit

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

ai multi-modal multi-modal-transformers multi-modality open-source transformer transformers vit

Last synced: 09 Nov 2024

https://github.com/kyegomez/hrtx

Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2

ai artificial-intelligence ensemble gpt4v machine-learning ml multi-modal multi-modality rt-2 rtx

Last synced: 16 Nov 2024

https://github.com/kyegomez/tinygptv

Simple Implementation of TinyGPTV in super simple Zeta lego blocks

artificial-intelligence attention attention-is-all-you-need deep-learning multi-modal multi-modality transformers

Last synced: 09 Nov 2024

https://github.com/kyegomez/mlxtransformer

Simple Implementation of a Transformer in the new framework MLX by Apple

artificial-intelligence gpt4 machine-learning multi-modal multi-modality

Last synced: 09 Nov 2024

https://github.com/kyegomez/hsss

Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling"

ai artificial-intelligence jesus machine-learning ml multi-modal multi-modality open-source pytorch rnn rnns ssms tensorflow zeta

Last synced: 09 Nov 2024

https://github.com/kyegomez/multimodal-tot

Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement

artificial-intelligence gpt4 multi-modal multi-modality multi-modality-data

Last synced: 09 Nov 2024

https://github.com/xufangzhi/moca

The implementation of MoCA

multi-modality textbook-question-answering

Last synced: 19 Dec 2024

https://github.com/kyegomez/visiondatasets

Open source scripts to create large scale datasets with rich detail for multi-modal models

ai artificial-intelligence function-calling gpt3 gpt4 json machine-learning ml multi-modal multi-modality pytorch tensorflow

Last synced: 09 Nov 2024

https://github.com/kyegomez/gats

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta

ai attention attention-is-all-you-need attention-mechanism gpt4 llama ml multi-modal multi-modality multimodal open-source

Last synced: 10 Oct 2024

https://github.com/kyegomez/vortexfusion

Transformers + Mambas + LSTMS All in One Model

agora ai ai-research deep-learning lstms mambas ml multi-modality ssms transformers

Last synced: 09 Nov 2024

https://github.com/kyegomez/aoa-torch

Implementation of Attention on Attention in Zeta

ai artificial-intelligence gpt4 machine-learning multi-modal multi-modality research

Last synced: 09 Nov 2024

https://github.com/ravi-teja-konda/tunedllavadelights

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

chatgpt dalle2 dessert finetuning gpt4 gpt4v llama2 llava multi-modality multimodal nutrition nutrition-information stable-diffusion tranformers vision-language-learning vision-language-model

Last synced: 15 Nov 2024

https://github.com/nagababumo/open-source-models-with-hugging-face

asr audio-detection audio-processing automatic-speech-recognition blip clip huggingface huggingface-spaces huggingface-transformers image-captioning image-classification image-retrieval multi-modality object-detection open-source segementation sentence-embeddings transformers visual-question-answering zero-shot-learning

Last synced: 14 Nov 2024

https://github.com/yuanze-lin/olympus

The official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

chatbot chatgpt deeplearning foundation-models instruction-tuning llava llms mllms multi-modality multimodal pytorch vision-language-model

Last synced: 14 Dec 2024