Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with mllm
A curated list of projects in awesome lists tagged with mllm .
https://github.com/microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
beit beit-3 bitnet deepnet document-ai foundation-models kosmos kosmos-1 layoutlm layoutxlm llm minilm mllm multimodal nlp pre-trained-model textdiffuser trocr unilm xlm-e
Last synced: 16 Dec 2024
https://github.com/X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
agent android app automation copilot gpt4v gui harmony ios mllm mobile mobile-agents multimodal multimodal-agent multimodal-large-language-models
Last synced: 11 Nov 2024
https://github.com/internlm/internlm-xcomposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Last synced: 19 Dec 2024
https://github.com/InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Last synced: 14 Nov 2024
https://github.com/cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
chatbot clip computer-vision dino instruction-tuning large-language-models llms mllm multimodal-large-language-models representation-learning
Last synced: 19 Dec 2024
https://github.com/x-plug/mplug-docowl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding
Last synced: 19 Dec 2024
https://github.com/X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding
Last synced: 17 Nov 2024
https://github.com/baai-dcai/bunny
A family of lightweight multimodal models.
chatgpt chinese english gpt-4 mllm multimodal-large-language-models vlm
Last synced: 09 Nov 2024
https://github.com/bradyfu/woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality
Last synced: 21 Dec 2024
https://github.com/BradyFU/Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality
Last synced: 16 Nov 2024
https://github.com/foundationvision/groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
foundation-models grounding large-language-models llama llama2 llm mllm multimodal vision-language-model
Last synced: 21 Dec 2024
https://github.com/nvlabs/eagle
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia
Last synced: 21 Dec 2024
https://github.com/NVlabs/EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia
Last synced: 26 Sep 2024
https://github.com/Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
deepspeed fine-tuning mllm model-parallel multimodal-large-language-models pipeline-parallelism pretraining qwen video-language-model video-large-language-models
Last synced: 16 Oct 2024
https://github.com/X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
benchmark chinese dataset mllm multimodal multimodal-large-language-models multimodal-pretraining video video-question-answering video-retrieval youku
Last synced: 09 Nov 2024
https://github.com/baaivision/eve
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
clip encoder-free-vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models vlm
Last synced: 20 Dec 2024
https://github.com/gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
comfyui custom-nodes image-captioning img2sfx img2text joytag llava llm mllm nodes phi15 siglip vlm
Last synced: 22 Nov 2024
https://github.com/gokayfem/comfyui_vlm_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
comfyui custom-nodes image-captioning img2sfx img2text joytag llava llm mllm nodes phi15 siglip vlm
Last synced: 17 Dec 2024
https://tiger-ai-lab.github.io/Mantis/
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
fuyu language llava-llama3 lmm mantis mllm multi-image-understanding multimodal video vision vlm
Last synced: 07 Nov 2024
https://github.com/baaivision/densefusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
image-descriptions mllm multimodal-large-language-models vision-language-models visual-perception vlm
Last synced: 16 Dec 2024
https://github.com/bz-lab/AUITestAgent
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
agent automation gpt-4o gui llm mllm mobile-app multi-agent multimodal multimodal-agent testing
Last synced: 08 Nov 2024
https://github.com/thu-ml/MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
benchmark claude fairness gpt-4 mllm multi-modal privacy robustness safety toolbox trustworthy-ai truthfulness
Last synced: 02 Dec 2024
https://github.com/thu-ml/mmtrusteval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
benchmark claude fairness gpt-4 mllm multi-modal privacy robustness safety toolbox trustworthy-ai truthfulness
Last synced: 16 Dec 2024
https://github.com/microsoft/eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
ai artificial-intelligence evaluation-framework llm machine-learning mllm
Last synced: 17 Dec 2024
https://github.com/foundationvision/generateu
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
mllm multimodality object-detection open-vocabulary open-vocabulary-detection open-world
Last synced: 05 Nov 2024
https://github.com/niutrans/vision-llm-alignment
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
alignment dpo llama3-vision llava llm mllm multi-model ppo reward rlhf sft vision
Last synced: 18 Nov 2024
https://github.com/buaadreamer/chinese-llava-med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
ai chinese gpt4v huggingface-datasets llama-factory llava medical minigpt4 mllm multimodal qwen1-5 transformers
Last synced: 06 Dec 2024
https://github.com/kwaivgi/uniaa
Unified Multi-modal IAA Baseline and Benchmark
benchmark dataset image-aesthetic-assessment llava mllm
Last synced: 09 Nov 2024
https://github.com/waltonfuture/Diff-eRank
Code for https://arxiv.org/abs/2401.17139 (NeurIPS 2024)
evaluation-metrics llm llm-inference machine-learning mllm neurips-2024
Last synced: 26 Nov 2024
https://github.com/buaadreamer/mllm-finetuning-demo
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
finetune-llm huggingface-datasets llama-factory llava lora mllm paligemma pretraining supervised-finetuning transformers yi-vl
Last synced: 06 Dec 2024
https://github.com/hewei2001/reachqa
Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"
Last synced: 04 Dec 2024
https://github.com/showlab/visincontext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
efficient in-context-learning llm mllm
Last synced: 09 Nov 2024
https://github.com/xirui-li/MOSSBench
An implementation for MLLM oversensitivity evaluatio
alignment attack mllm oversensitivity vlm
Last synced: 02 Dec 2024
https://github.com/buaadreamer/qwen2-vl-history
Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums
beauty history llama-factory mllm multimodal-large-language-models museum qwen2-vl supervised-finetuning
Last synced: 06 Dec 2024
https://github.com/freedomintelligence/trim
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their performance.
llm mllm multimodal vision-and-language vision-language-model vlm
Last synced: 17 Nov 2024
https://github.com/tychenjiajun/exif-ai
A Node.js CLI and library that uses OpenAI, Ollama, ZhipuAI, Google Gemini or Coze to write AI-generated image descriptions and/or tags to EXIF metadata by its content.
ai cli cli-tool coze exif gemini image jpeg jpg llm metadata mllm ollama openai openai-api photo zhipu
Last synced: 11 Oct 2024
https://github.com/pipixin321/awesome-video-mllms
:fire: :fire: :fire: Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding :video_camera:
awesome-list benchmarks large-language-models mllm video video-understanding
Last synced: 09 Dec 2024