Projects in Awesome Lists tagged with visual-language-models
A curated list of projects in awesome lists tagged with visual-language-models .
https://github.com/thudm/cogvlm
a state-of-the-art-level open visual language model | 多模态预训练模型
cross-modality language-model multi-modal pretrained-models visual-language-models
Last synced: 14 May 2025
https://github.com/THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
cross-modality language-model multi-modal pretrained-models visual-language-models
Last synced: 28 Mar 2025
https://github.com/camel-ai/crab
🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
gui-automation language-model-agent large-language-models multi-agent-systems visual-language-models
Last synced: 15 May 2025
https://github.com/bilel-bj/ROSGPT_Vision
Commanding robots using only Language Models' prompts
chatgpt language-models language-models-are-next large-language-models llm prompt-engineering prompting-robotic-modalities robotic-design-patterns robotic-vision robotics ros2 visual-language-models
Last synced: 24 Mar 2025
https://github.com/hk-zh/language-conditioned-robot-manipulation-models
https://arxiv.org/abs/2312.10807
foundation-models imitation-learning language-conditioned-learning large-languge-models neural-symbolic reinforcement-learning robot-manipulation visual-language-models
Last synced: 09 Oct 2025
https://github.com/tianyu-z/vcr
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
benchmark deep-learning visual-language-models
Last synced: 07 Oct 2025
https://github.com/amathislab/wildclip
Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models
behavior camera-trap clip computer-vision computervision visual-language-models
Last synced: 03 Feb 2026
https://github.com/declare-lab/sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
multimodality naacl2024 video-question-answering video-understanding visual-language-models
Last synced: 14 Apr 2025
https://github.com/shreydan/vlm-od
experimental: finetune smolVLM on COCO (without any special <locXYZ> tokens)
computer-vision deep-learning llm object-detection transformers visual-language-models vlm
Last synced: 28 Jun 2025
https://github.com/rooshikeshbhatt/item-inspector-ai
AI-based product condition detection using BLIP-2 + FastAPI + Phi-4 (Ollama)
ai blip2 computer-vision condition-scoring ecommerce-ai fastapi hugging image-analysis image-tagging multimodal-ai natural-language-generation ollama open-source phi4 product-inspection prompt-engineering pyotrch python visual-language-models zero-shot-learning
Last synced: 09 May 2026
https://github.com/rooshikesh/item-inspector-ai
AI-based product condition detection using BLIP-2 + FastAPI + Phi-4 (Ollama)
ai blip2 computer-vision condition-scoring ecommerce-ai fastapi hugging image-analysis image-tagging multimodal-ai natural-language-generation ollama open-source phi4 product-inspection prompt-engineering pyotrch python visual-language-models zero-shot-learning
Last synced: 18 Jun 2025
https://github.com/legalaspro/modern_ai_foundations
A collection of implementations exploring modern AI architectures and foundational models.
cvae diffusion-models flowmatching vae vae-pytorch vision-transformer visual-language-models vlms
Last synced: 23 Jun 2025