Projects in Awesome Lists tagged with vision-models

https://github.com/mdgrey33/pyvisionai

The PyVisionAI Official Repo

claude-3-5-sonnet computer llama localllm ocr ollama open-source openai python vision vision-models vlm

Last synced: 23 Oct 2025

https://github.com/afondiel/computer-vision-challenge

A hands-on collection of foundational computer vision projects for everyone.

cnn computer-vision computer-vision-algorithms computer-vision-challenge computer-vision-datasets computer-vision-hello-world computer-vision-opencv computer-vision-projects computer-vision-python computer-vision-tools cv-challenge image-classification image-detection image-generation image-processing lvm vision-models vision-transformer vlm

Last synced: 07 Apr 2025

https://github.com/papperrollinggery/zhijuan-prompt-card

Local-first Chrome extension that turns images into generator-ready prompts.

ai-tools byok chrome-extension image-to-prompt local-first manifest-v3 openai-compatible prompt-engineering react typescript vision-models vite

Last synced: 24 Jun 2026

https://github.com/kyegomez/visionllama

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta

ai deep-learning multi-modal vision-models vision-transformers vit

Last synced: 23 Jul 2025

https://github.com/aws-samples/sample-badgers

Guidance on deploying a generative AI document analysis with Amazon Bedrock AgentCore. Auto-classifies, enhances, and aggregates multi-type documents using Gestalt-informed vision prompts. Custom analyzer creation wizard. Scripted CDK deployment. Gradio frontend included.

agentcore agentcore-sdk agentic-ai agentic-workflow amazon-nova badgers cdk claude composable-prompts document-extraction document-intelligence document-vision full-text-extraction gestalt prompt-engineering strands-agent-sdk strands-agents vision-models

Last synced: 14 Apr 2026

https://github.com/kyegomez/midas

Implementation of Midas from [Towards Robust Monocular Depth Estimation] in Pytorch and Zeta

ai artificial-intelligence ml multi-modal parallel python pytorch tensorflow vision-models

Last synced: 25 Oct 2025

https://github.com/pavansomisetty21/image-caption-generation-using-llms-gemini-

we generate captions to the images which are given by user(user input) using prompt engineering and Generative AI

artificial-intelligence description encoder-decoder-architecture gemini gemini-api gen-ai generate-contents generative-ai generativemodel image image-caption-generator image-captioning image-descriptions image-descriptor multi-model-learning openai vision vision-language-model vision-models visual-models

Last synced: 28 Jun 2025

https://github.com/the-swarm-corporation/swarm-models

A simple to use package to call various model providers such as openai, anthropic, and others with utmost reliability, security, and performance.

agents ai computer-vision enterprise-grade library llms ml production-ready swarms tool usage vision-models

Last synced: 27 Jul 2025

https://github.com/codelined-ag/extracto

Your private document brain. PDFs in, RAG out. Self-hosted. Plug everywhere.

agents bun claude docker document-processing mcp mcp-server mistral nextjs ocr ollama openrouter pdf-ocr rag self-hosted vector-database vision-models

Last synced: 10 May 2026

https://github.com/major196512/vistem

General Vision Model Training Template

pytorch vision-models

Last synced: 08 May 2025

https://github.com/the-swarm-corporation/dart

DART (Diffusion-Autoregressive Recursive Transformer) is a novel hybrid architecture that combines diffusion-based and autoregressive approaches for text generation.

agents anthropic attention autogressive diffusion dit gpts llms midjourney openai research text-generation torch transformers vision-models