Projects in Awesome Lists tagged with vision-models
A curated list of projects in awesome lists tagged with vision-models .
https://github.com/mdgrey33/pyvisionai
The PyVisionAI Official Repo
claude-3-5-sonnet computer llama localllm ocr ollama open-source openai python vision vision-models vlm
Last synced: 23 Oct 2025
https://github.com/afondiel/computer-vision-challenge
A hands-on collection of foundational computer vision projects for everyone.
cnn computer-vision computer-vision-algorithms computer-vision-challenge computer-vision-datasets computer-vision-hello-world computer-vision-opencv computer-vision-projects computer-vision-python computer-vision-tools cv-challenge image-classification image-detection image-generation image-processing lvm vision-models vision-transformer vlm
Last synced: 07 Apr 2025
https://github.com/kyegomez/visionllama
Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta
ai deep-learning multi-modal vision-models vision-transformers vit
Last synced: 23 Jul 2025
https://github.com/aws-samples/sample-badgers
Guidance on deploying a generative AI document analysis with Amazon Bedrock AgentCore. Auto-classifies, enhances, and aggregates multi-type documents using Gestalt-informed vision prompts. Custom analyzer creation wizard. Scripted CDK deployment. Gradio frontend included.
agentcore agentcore-sdk agentic-ai agentic-workflow amazon-nova badgers cdk claude composable-prompts document-extraction document-intelligence document-vision full-text-extraction gestalt prompt-engineering strands-agent-sdk strands-agents vision-models
Last synced: 14 Apr 2026
https://github.com/kyegomez/midas
Implementation of Midas from [Towards Robust Monocular Depth Estimation] in Pytorch and Zeta
ai artificial-intelligence ml multi-modal parallel python pytorch tensorflow vision-models
Last synced: 25 Oct 2025
https://github.com/pavansomisetty21/image-caption-generation-using-llms-gemini-
we generate captions to the images which are given by user(user input) using prompt engineering and Generative AI
artificial-intelligence description encoder-decoder-architecture gemini gemini-api gen-ai generate-contents generative-ai generativemodel image image-caption-generator image-captioning image-descriptions image-descriptor multi-model-learning openai vision vision-language-model vision-models visual-models
Last synced: 28 Jun 2025
https://github.com/the-swarm-corporation/swarm-models
A simple to use package to call various model providers such as openai, anthropic, and others with utmost reliability, security, and performance.
agents ai computer-vision enterprise-grade library llms ml production-ready swarms tool usage vision-models
Last synced: 27 Jul 2025
https://github.com/codelined-ag/extracto
Your private document brain. PDFs in, RAG out. Self-hosted. Plug everywhere.
agents bun claude docker document-processing mcp mcp-server mistral nextjs ocr ollama openrouter pdf-ocr rag self-hosted vector-database vision-models
Last synced: 10 May 2026
https://github.com/major196512/vistem
General Vision Model Training Template
Last synced: 08 May 2025
https://github.com/the-swarm-corporation/dart
DART (Diffusion-Autoregressive Recursive Transformer) is a novel hybrid architecture that combines diffusion-based and autoregressive approaches for text generation.
agents anthropic attention autogressive diffusion dit gpts llms midjourney openai research text-generation torch transformers vision-models
Last synced: 31 Jul 2025
https://github.com/afondiel/prompt-engineering-for-vision-models-deeplearningai
These notes and resources are compiled from the crash course Prompt Engineering for Vision Models offered by DeepLearning.AI.
cnn computer-vision convnets diffusion-models fine-tuning generative-models image-processing large-vision-language-models large-vision-models meta-sam prompt-engineering video-processing vision-language-model vision-model-prompting vision-models visual-prompting vit
Last synced: 25 Aug 2025
https://github.com/shivendrra/ava
building AVA from ex-machina; a lightweight multi-modal system from scratch, just for learning & experimentation
audio-classification audio-engine audio-transformers large-language-models llm machine-learning swin-transformer transformer vision vision-engine vision-models vision-transformer
Last synced: 31 Mar 2025
https://github.com/afondiel/how-diffusion-models-work-crash-course-dlai
Diffusion Models crash course with Pytorch from DeepLearningAI
computer-vision conditional-diffusion conditional-generation diffusion-models genai generative-ai latent-diffusion latent-space unconditional-generation vision-models
Last synced: 16 Oct 2025
https://github.com/antonio-f/moondream
Testing the Moondream tiny vision model
artificial-intelligence hands-on huggingface-transformers image-captioning image-descriptions language-models running-locally tiny-models tutorial vision-models vision-transformers
Last synced: 30 Mar 2025