Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with mllm

A curated list of projects in awesome lists tagged with mllm .

https://github.com/microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

beit beit-3 bitnet deepnet document-ai foundation-models kosmos kosmos-1 layoutlm layoutxlm llm minilm mllm multimodal nlp pre-trained-model textdiffuser trocr unilm xlm-e

Last synced: 29 Sep 2024

https://github.com/InternLM/InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

Last synced: 03 Aug 2024

https://github.com/internlm/internlm-xcomposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

Last synced: 01 Oct 2024

https://github.com/x-plug/mplug-docowl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding

Last synced: 30 Sep 2024

https://github.com/X-PLUG/mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding

Last synced: 03 Aug 2024

https://github.com/baai-dcai/bunny

A family of lightweight multimodal models.

chatgpt chinese english gpt-4 mllm multimodal-large-language-models vlm

Last synced: 02 Aug 2024

https://github.com/BradyFU/Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

hallucination hallucinations large-language-models llm mllm multimodal-large-language-models multimodality

Last synced: 03 Aug 2024

https://github.com/foundationvision/groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

foundation-models grounding large-language-models llama llama2 llm mllm multimodal vision-language-model

Last synced: 27 Sep 2024

https://github.com/nvlabs/eagle

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia

Last synced: 01 Oct 2024

https://github.com/NVlabs/EAGLE

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia

Last synced: 26 Sep 2024

https://github.com/X-PLUG/Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark chinese dataset mllm multimodal multimodal-large-language-models multimodal-pretraining video video-question-answering video-retrieval youku

Last synced: 02 Aug 2024

https://github.com/gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

comfyui custom-nodes image-captioning img2sfx img2text joytag llava llm mllm nodes phi15 siglip vlm

Last synced: 05 Aug 2024

https://tiger-ai-lab.github.io/Mantis/

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

fuyu language llava-llama3 lmm mantis mllm multi-image-understanding multimodal video vision vlm

Last synced: 01 Aug 2024

https://github.com/thu-ml/MMTrustEval

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)

benchmark claude fairness gpt-4 mllm multi-modal privacy robustness safety toolbox trustworthy-ai truthfulness

Last synced: 12 Aug 2024

https://github.com/microsoft/eureka-ml-insights

A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.

ai artificial-intelligence evaluation-framework llm machine-learning mllm

Last synced: 28 Sep 2024

https://github.com/bz-lab/AUITestAgent

AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.

agent automation gpt-4o gui llm mllm mobile-app multi-agent multimodal multimodal-agent testing

Last synced: 01 Aug 2024

https://github.com/tychenjiajun/exif-ai

A Node.js CLI and library that uses OpenAI, Ollama, ZhipuAI, Google Gemini or Coze to write AI-generated image descriptions and/or tags to EXIF metadata by its content.

ai cli cli-tool coze exif gemini image jpeg jpg llm metadata mllm ollama openai openai-api photo zhipu

Last synced: 27 Sep 2024

https://github.com/xirui-li/MOSSBench

MOSSBench: A webpage for an oversensitivity benchmark

alignment attack mllm oversensitivity vlm

Last synced: 12 Aug 2024