Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with mllm
A curated list of projects in awesome lists tagged with mllm .
https://github.com/microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
beit beit-3 bitnet deepnet document-ai foundation-models kosmos kosmos-1 layoutlm layoutxlm llm minilm mllm multimodal nlp pre-trained-model textdiffuser trocr unilm xlm-e
Last synced: 29 Sep 2024
https://github.com/cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
chatbot clip computer-vision dino instruction-tuning large-language-models llms mllm multimodal-large-language-models representation-learning
Last synced: 30 Sep 2024
https://github.com/InternLM/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Last synced: 03 Aug 2024
https://github.com/internlm/internlm-xcomposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Last synced: 01 Oct 2024
https://github.com/x-plug/mplug-docowl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding
Last synced: 30 Sep 2024
https://github.com/X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding
Last synced: 03 Aug 2024
https://github.com/baai-dcai/bunny
A family of lightweight multimodal models.
chatgpt chinese english gpt-4 mllm multimodal-large-language-models vlm
Last synced: 02 Aug 2024
https://github.com/BradyFU/Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality
Last synced: 03 Aug 2024
https://github.com/foundationvision/groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
foundation-models grounding large-language-models llama llama2 llm mllm multimodal vision-language-model
Last synced: 27 Sep 2024
https://github.com/nvlabs/eagle
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia
Last synced: 01 Oct 2024
https://github.com/NVlabs/EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia
Last synced: 26 Sep 2024
https://github.com/X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
benchmark chinese dataset mllm multimodal multimodal-large-language-models multimodal-pretraining video video-question-answering video-retrieval youku
Last synced: 02 Aug 2024
https://github.com/gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
comfyui custom-nodes image-captioning img2sfx img2text joytag llava llm mllm nodes phi15 siglip vlm
Last synced: 05 Aug 2024
https://tiger-ai-lab.github.io/Mantis/
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
fuyu language llava-llama3 lmm mantis mllm multi-image-understanding multimodal video vision vlm
Last synced: 01 Aug 2024
https://github.com/thu-ml/MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)
benchmark claude fairness gpt-4 mllm multi-modal privacy robustness safety toolbox trustworthy-ai truthfulness
Last synced: 12 Aug 2024
https://github.com/microsoft/eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
ai artificial-intelligence evaluation-framework llm machine-learning mllm
Last synced: 28 Sep 2024
https://github.com/bz-lab/AUITestAgent
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
agent automation gpt-4o gui llm mllm mobile-app multi-agent multimodal multimodal-agent testing
Last synced: 01 Aug 2024
https://github.com/tychenjiajun/exif-ai
A Node.js CLI and library that uses OpenAI, Ollama, ZhipuAI, Google Gemini or Coze to write AI-generated image descriptions and/or tags to EXIF metadata by its content.
ai cli cli-tool coze exif gemini image jpeg jpg llm metadata mllm ollama openai openai-api photo zhipu
Last synced: 27 Sep 2024
https://github.com/xirui-li/MOSSBench
MOSSBench: A webpage for an oversensitivity benchmark
alignment attack mllm oversensitivity vlm
Last synced: 12 Aug 2024