Projects in Awesome Lists tagged with visual-language-learning
A curated list of projects in awesome lists tagged with visual-language-learning .
https://github.com/haotian-liu/llava
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning
Last synced: 17 Nov 2025
https://github.com/haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning
Last synced: 14 Mar 2025
https://github.com/next-gpt/next-gpt
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
chatgpt foundation-models gpt-4 instruction-tuning large-language-models llm mllm multi-modal-chatgpt multimodal visual-language-learning
Last synced: 14 May 2025
https://github.com/NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
chatgpt foundation-models gpt-4 instruction-tuning large-language-models llm multi-modal-chatgpt multimodal visual-language-learning
Last synced: 12 Mar 2025
https://github.com/evolvinglmms-lab/otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
artificial-inteligence chatgpt deep-learning embodied-ai foundation-models gpt-4 instruction-tuning large-scale-models machine-learning multi-modality visual-language-learning
Last synced: 13 Dec 2025
https://github.com/internlm/internlm-xcomposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Last synced: 15 May 2025
https://github.com/InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Last synced: 07 May 2025
https://github.com/RLHF-V/RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
chatbot gpt-4 llama multi-modality multimodal rlhf-v visual-language-learning
Last synced: 24 Feb 2025
https://github.com/thomas-yanxin/karmavlm
🧘🏻♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
llama2 llava multimodel qwen2 vision-language-model visual-language-learning vlm
Last synced: 02 Aug 2025
https://github.com/adrianbzg/llama-multimodal-vqa
Multimodal Instruction Tuning for Llama 3
chatbot chatgpt gpt-4 huggingface instruction-tuning language-models llama llama2 llama3 multimodal multimodal-instruction-tuning visual-language-learning visual-question-answering vqa
Last synced: 25 Oct 2025
https://github.com/ashleykleynhans/llava-docker
Docker image for LLaVA: Large Language and Vision Assistant
ai chatbot chatgpt docker docker-image foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava llm multimodal runpod vision-language-model visual-language-learning
Last synced: 24 Dec 2025