Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with multi-modality
A curated list of projects in awesome lists tagged with multi-modality .
https://github.com/haotian-liu/llava
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning
Last synced: 16 Dec 2024
https://github.com/haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning
Last synced: 25 Oct 2024
https://github.com/jina-ai/clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
bert bert-as-service clip-as-service clip-model cross-modal-retrieval cross-modality deep-learning image2vec multi-modality neural-search onnx openai pytorch sentence-encoding sentence2vec
Last synced: 16 Dec 2024
https://github.com/lucidrains/deep-daze
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
artificial-intelligence deep-learning implicit-neural-representation multi-modality siren text-to-image transformers
Last synced: 18 Dec 2024
https://github.com/luodian/otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
artificial-inteligence chatgpt deep-learning embodied-ai foundation-models gpt-4 instruction-tuning large-scale-models machine-learning multi-modality visual-language-learning
Last synced: 19 Dec 2024
https://github.com/Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
artificial-inteligence chatgpt deep-learning embodied-ai foundation-models gpt-4 instruction-tuning large-scale-models machine-learning multi-modality visual-language-learning
Last synced: 24 Oct 2024
https://github.com/internlm/internlm-xcomposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Last synced: 19 Dec 2024
https://github.com/InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Last synced: 14 Nov 2024
https://github.com/kyegomez/swarms
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
agents ai artificial-intelligence attention-mechanism chatgpt gpt4 gpt4all huggingface langchain langchain-python machine-learning multi-modal-imaging multi-modality multimodal prompt-engineering prompt-toolkit prompting swarms transformer-models tree-of-thoughts
Last synced: 17 Dec 2024
https://github.com/DLR-RM/3DObjectTracking
Algorithms and Publications on 3D Object Tracking
accv2020 articulated computer-vision cvpr2022 ijcv iros2023 multi-body multi-modality object-tracking paper pose-estimation real-time rgbd tpami tracking
Last synced: 27 Oct 2024
https://github.com/opengvlab/multi-modality-arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa
Last synced: 09 Nov 2024
https://github.com/OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa
Last synced: 04 Nov 2024
https://github.com/ziqihuangg/Collaborative-Diffusion
Collaborative Diffusion (CVPR 2023)
aigc diffusion-models face-editing face-generation gen-ai image-editing image-generation latent-diffusion-models multi-modality stable-diffusion
Last synced: 31 Oct 2024
https://github.com/kyegomez/Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
artificial-intelligence chatgpt deep-learning multi-modality neural-network optimizer
Last synced: 29 Nov 2024
https://github.com/kyegomez/sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
artificial-intelligence chatgpt deep-learning multi-modality neural-network optimizer
Last synced: 21 Dec 2024
https://github.com/kyegomez/gemini
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
ai artificial-intelligence gemini gpt4 machine-learning ml multi-modality multimodla
Last synced: 18 Dec 2024
https://github.com/kyegomez/Gemini
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
ai artificial-intelligence gemini gpt4 machine-learning ml multi-modality multimodla
Last synced: 05 Nov 2024
https://github.com/zwwwayne/mmmot
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
Last synced: 18 Nov 2024
https://github.com/ZwwWayne/mmMOT
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
Last synced: 28 Oct 2024
https://github.com/dvlab-research/UVTR
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
3d-detection multi-modality pytorch
Last synced: 28 Oct 2024
https://github.com/sshh12/multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
large-context large-language-models large-multimodal-models llava llm multi-modality multimodal vision-language-model
Last synced: 17 Nov 2024
https://github.com/jina-ai/rungpt
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
flamingo gpt-4 large-language-models large-multimadality-models llama llm-hosting llm-serve lmm-serve multi-modality opengpt self-hosting transformers
Last synced: 17 Dec 2024
https://github.com/kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
agora artficial-intelligence autogpt chain-of-thought chatgpt deep-learning deep-learning-algorithms multi-modal-fusion multi-modality multimodal-deep-learning prompt-engineering reinforcement-learning tree-of-thoughts
Last synced: 18 Dec 2024
https://github.com/kyegomez/andromeda
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
agi artificial-general-intelligence artificial-intelligence artificial-intelligence-algorithms deep-learning gpt-4 language-model large-language-models multi-modality multimodal neural-networks transformer
Last synced: 16 Dec 2024
https://github.com/kyegomez/mambabyte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
ai artificial-intelligence gpt4v machine-learning mamba megabyte ml multi-modality tokenizer
Last synced: 21 Dec 2024
https://github.com/kyegomez/MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
ai artificial-intelligence gpt4v machine-learning mamba megabyte ml multi-modality tokenizer
Last synced: 28 Oct 2024
https://github.com/kyegomez/kosmos2.5
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
attention attention-is-all-you-need gpt3 gpt4 kosmos multi-modality multimodal multimodal-deep-learning opensource
Last synced: 16 Dec 2024
https://github.com/rentainhe/trar-vqa
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2
Last synced: 07 Nov 2024
https://github.com/kyegomez/moe-mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
ai ml moe multi-modal-fusion multi-modality swarms
Last synced: 09 Nov 2024
https://github.com/amazon-science/crossmodal-contrastive-learning
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
computer-vision contrastive-learning multi-modality natural-language-processing transformers video video-captioning video-text-retrieval
Last synced: 12 Nov 2024
https://chenshuang-zhang.github.io/imagenet_d/
benchmark computer-vision dataset diffusion-models generative-models image-recognition imagenet large-language-model multi-modality out-of-distribution recognition robustness stable-diffusion synthetic-data text-to-image-synthesis vision-language-model
Last synced: 02 Nov 2024
https://github.com/kyegomez/qformer
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
ai artificial-intelligence attention-mechanism blip2 machine machine-learning multi-modal multi-modality
Last synced: 19 Dec 2024
https://github.com/trendscenter/fit
Fusion ICA Toolbox (MATLAB)
analysis cca eeg fmri gene ica iva joint-ica matlab mcca multi-modality parallel-ica pca
Last synced: 24 Nov 2024
https://github.com/kyegomez/mm1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
ai artificial-intelligence deep-learning gpt4 machine-learning ml mm1 multi-modal multi-modal-revolution multi-modality
Last synced: 16 Nov 2024
https://github.com/kyegomez/fuyu
Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch
ai artificial-intelligence gpt4 gpt5 machine-learning multi-modal multi-modality
Last synced: 09 Nov 2024
https://github.com/kyegomez/forest-of-thoughts
A forest of autonomous agents.
ai artificial-intelligence machine-learning ml multi-modal multi-modality
Last synced: 09 Nov 2024
https://github.com/kyegomez/swarmos
An all-new OS that orchestrates autonomous agents as workers to execute tasks.
ai asynchronous asynchronous-programming concurrent gpt4 llms ml multi-modality multithreading operating-system os swarms
Last synced: 09 Nov 2024
https://github.com/kyegomez/athena-for-search
The World's First AI-Enabled Multi-Modality Native Search Engine
agora apacai artificial-intelligence bing chatgpt chatgpt-api data data-engineering google human-computer-interaction multi-modal-imaging multi-modality multi-modality-data search-algorithm search-engine user-interface
Last synced: 09 Nov 2024
https://github.com/kyegomez/mc-vit
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
ai multi-modal multi-modal-transformers multi-modality open-source transformer transformers vit
Last synced: 09 Nov 2024
https://github.com/kyegomez/hrtx
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
ai artificial-intelligence ensemble gpt4v machine-learning ml multi-modal multi-modality rt-2 rtx
Last synced: 16 Nov 2024
https://github.com/kyegomez/tinygptv
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
artificial-intelligence attention attention-is-all-you-need deep-learning multi-modal multi-modality transformers
Last synced: 09 Nov 2024
https://github.com/kyegomez/mlxtransformer
Simple Implementation of a Transformer in the new framework MLX by Apple
artificial-intelligence gpt4 machine-learning multi-modal multi-modality
Last synced: 09 Nov 2024
https://github.com/kyegomez/hsss
Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling"
ai artificial-intelligence jesus machine-learning ml multi-modal multi-modality open-source pytorch rnn rnns ssms tensorflow zeta
Last synced: 09 Nov 2024
https://github.com/kyegomez/multimodal-tot
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
artificial-intelligence gpt4 multi-modal multi-modality multi-modality-data
Last synced: 09 Nov 2024
https://github.com/xufangzhi/moca
The implementation of MoCA
multi-modality textbook-question-answering
Last synced: 19 Dec 2024
https://github.com/kyegomez/visiondatasets
Open source scripts to create large scale datasets with rich detail for multi-modal models
ai artificial-intelligence function-calling gpt3 gpt4 json machine-learning ml multi-modal multi-modality pytorch tensorflow
Last synced: 09 Nov 2024
https://github.com/kyegomez/gats
Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta
ai attention attention-is-all-you-need attention-mechanism gpt4 llama ml multi-modal multi-modality multimodal open-source
Last synced: 10 Oct 2024
https://github.com/kyegomez/vortexfusion
Transformers + Mambas + LSTMS All in One Model
agora ai ai-research deep-learning lstms mambas ml multi-modality ssms transformers
Last synced: 09 Nov 2024
https://github.com/kyegomez/aoa-torch
Implementation of Attention on Attention in Zeta
ai artificial-intelligence gpt4 machine-learning multi-modal multi-modality research
Last synced: 09 Nov 2024
https://github.com/ravi-teja-konda/tunedllavadelights
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition
chatgpt dalle2 dessert finetuning gpt4 gpt4v llama2 llava multi-modality multimodal nutrition nutrition-information stable-diffusion tranformers vision-language-learning vision-language-model
Last synced: 15 Nov 2024
https://github.com/nagababumo/open-source-models-with-hugging-face
asr audio-detection audio-processing automatic-speech-recognition blip clip huggingface huggingface-spaces huggingface-transformers image-captioning image-classification image-retrieval multi-modality object-detection open-source segementation sentence-embeddings transformers visual-question-answering zero-shot-learning
Last synced: 14 Nov 2024
https://github.com/yuanze-lin/olympus
The official code for "Olympus: A Universal Task Router for Computer Vision Tasks"
chatbot chatgpt deeplearning foundation-models instruction-tuning llava llms mllms multi-modality multimodal pytorch vision-language-model
Last synced: 14 Dec 2024