{"id":49955330,"url":"https://github.com/jonykarmakar/vision-command-ai","last_synced_at":"2026-07-02T16:00:53.527Z","repository":{"id":356359635,"uuid":"1231884362","full_name":"JonyKarmakar/vision-command-ai","owner":"JonyKarmakar","description":"End-to-end AI computer vision studio with YOLO detection, crop/blur editing, command workflows, PostgreSQL logging, Docker, CI/CD, and Render deployment.","archived":false,"fork":false,"pushed_at":"2026-06-30T00:31:02.000Z","size":1140,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-30T02:20:05.913Z","etag":null,"topics":["computer-vision","docker","docker-compose","fastapi","full-stack-ai-object-detection","github-actions","llmops","mlops","mlops-workflow","postgresql","react","render-deployment","typescript","yolo"],"latest_commit_sha":null,"homepage":"https://vision-command-frontend.onrender.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JonyKarmakar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-07T11:32:25.000Z","updated_at":"2026-06-30T00:31:06.000Z","dependencies_parsed_at":"2026-07-02T16:00:52.169Z","dependency_job_id":null,"html_url":"https://github.com/JonyKarmakar/vision-command-ai","commit_stats":null,"previous_names":["jonykarmakar/vision-command-ai"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/JonyKarmakar/vision-command-ai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonyKarmakar%2Fvision-command-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonyKarmakar%2Fvision-command-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonyKarmakar%2Fvision-command-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonyKarmakar%2Fvision-command-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JonyKarmakar","download_url":"https://codeload.github.com/JonyKarmakar/vision-command-ai/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonyKarmakar%2Fvision-command-ai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35053492,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-02T02:00:06.368Z","response_time":173,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","docker","docker-compose","fastapi","full-stack-ai-object-detection","github-actions","llmops","mlops","mlops-workflow","postgresql","react","render-deployment","typescript","yolo"],"created_at":"2026-05-17T23:03:46.600Z","updated_at":"2026-07-02T16:00:53.515Z","avatar_url":"https://github.com/JonyKarmakar.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VisionCommand AI\n\nVisionCommand AI is a full-stack AI media assistant for image and video analysis, editing, workflow history, and AI-assisted command execution.\n\nUsers can upload images or videos, detect objects, crop or blur detected regions, zoom into targets, extract and analyze video frames, track objects across video, and review completed workflows through clean product-facing history panels.\n\nThe project is built as more than a computer vision demo. It combines practical Computer Vision, LLM-assisted command intelligence, full-stack AI engineering, workflow observability, PostgreSQL-backed persistence, Docker, CI/CD, and production-style development practices.\n\n---\n\n## Demo Status\n\nVisionCommand AI currently supports a polished local demo flow with two modes.\n\n### User Mode\n\nUser Mode provides a clean product-facing experience.\n\nIt focuses on:\n\n- Clean image and video upload flows\n- Object detection and editing results\n- Generated output history\n- Video command history\n- Ordered result panels\n- Simple View result navigation\n- Minimal technical/debug wording\n\n### Developer Mode\n\nDeveloper Mode keeps the engineering depth visible.\n\nIt preserves:\n\n- Original and stored filenames\n- Content types and media metadata\n- JSON copy and download actions\n- Parser and planner metadata\n- LLMOps and observability panels\n- Database summaries and logs\n- Generated output analytics\n- Assistant/debug result cards\n\nThis separation allows the same application to work as both a clean AI product demo and a technical engineering showcase.\n\n---\n\n## Live Demo\n\nA first public Render deployment was completed earlier and documented in the project docs.\n\nFrontend demo:\n\n```text\nhttps://vision-command-frontend.onrender.com\n```\n\nBackend demo:\n\n```text\nhttps://vision-command-backend.onrender.com\n```\n\nRender free-tier limitations still apply. The backend may sleep after inactivity, first requests can be slow, YOLO inference is slower on free instances, and uploaded/generated media use temporary container storage unless a persistent storage strategy is added.\n\nDeployment notes are available in:\n\n```text\ndocs/render-deployment-evidence.md\ndocs/render-first-deployment-runbook.md\ndocs/render-troubleshooting-notes.md\n```\n\n---\n\n## Core Capabilities\n\n### Image workflows\n\n- Upload and preview images\n- Run object detection\n- View annotated detection outputs\n- Filter detections by confidence and class\n- Crop detected objects\n- Blur detected objects\n- Blur all objects of a selected class\n- Zoom into detected objects\n- Run detection again on generated outputs\n- Reuse generated outputs as active image sources\n- Review generated output history, grouping, analytics, and lineage\n\n### Video workflows\n\n- Upload and preview videos\n- Extract video metadata such as duration, FPS, frame count, width, and height\n- Trim video clips\n- Extract a single frame from a timestamp\n- Extract multiple frames from a time range\n- Run detection on extracted frames\n- Run sampled detection across video\n- Track objects across sampled video frames\n- Review completed video actions through Video command history\n- Navigate back to video results using View result actions\n- Show video result panels in the order actions were completed\n\n### AI Assistant workflows\n\n- Run image and video commands through text input\n- Use browser-based voice input for supported commands\n- Execute commands such as crop, blur, zoom, trim video, extract frame, detect frames, and track video\n- Support rule-based, mock LLM, and real LLM command paths\n- Preview command plans and prepared execution outputs\n- Preserve technical command outputs in Developer Mode\n\n---\n\n## Demo Flows\n\n### Image demo flow\n\n```text\nUpload image\nDetect objects\nFilter detections\nCrop or blur detected objects\nUse AI Assistant to zoom into a target\nReview generated outputs\nUse a generated output as the active image\nRun detection on generated output\n```\n\n### Video demo flow\n\n```text\nUpload video\nTrim video\nExtract one frame\nExtract multiple frames\nDetect objects across video\nDetect objects on extracted frames\nTrack objects across video\nReview Video command history\nUse View result navigation\n```\n\n### Example AI Assistant commands\n\n```text\ndetect objects\ncrop person\nblur person\nblur all persons\nzoom person\nextract frame at 1 second\nextract frames from 0 to 3 seconds\ndetect frames from 0 to 3 seconds\ntrim video from 0 to 2 seconds\ntrack video from 0 to 3 seconds\ntrack person from 0 to 3 seconds\n```\n\n---\n\n## Architecture\n\n```text\nReact + TypeScript frontend\n        |\n        | /api requests\n        v\nFastAPI backend\n        |\n        | AI/media services\n        v\nYOLO, OpenCV, Pillow, FFmpeg, PyTorch\n        |\n        | optional persistence\n        v\nPostgreSQL\n```\n\n### Frontend\n\nThe frontend is a React and TypeScript application built with Vite. It manages User Mode and Developer Mode, media upload controls, image and video result panels, AI Assistant command UI, generated output history, video command history, workspace recovery panels, and observability dashboards.\n\n### Backend\n\nThe backend is a FastAPI service. It handles image upload, video upload, object detection, image crop/blur/zoom workflows, video trimming, frame extraction, sampled detection, tracking, command parsing, planning, validation, execution, LLM provider integration, PostgreSQL persistence, and JSON APIs for workflow results and logs.\n\n### Database\n\nPostgreSQL is optional for local development but supported for persistence. When configured, it stores uploaded media metadata, detection results, inference logs, command logs, parser attempt logs, generated output history, and workflow lineage data.\n\nWhen `DATABASE_URL` is not set, the app still runs with safe fallback behavior for local demos.\n\n---\n\n## Tech Stack\n\n### Frontend\n\n- React\n- TypeScript\n- Vite\n- CSS\n\n### Backend\n\n- Python\n- FastAPI\n- Uvicorn\n- Pillow\n- OpenCV\n- Ultralytics YOLO\n- PyTorch\n- imageio-ffmpeg\n- OpenAI SDK\n\n### Database and persistence\n\n- PostgreSQL\n- psycopg\n\n### DevOps\n\n- Git\n- GitHub\n- GitHub Actions\n- Docker\n- Docker Compose\n- Render deployment configuration\n\n---\n\n## Project Structure\n\n```text\nvision-command-ai/\n├── backend/\n│   ├── app/\n│   │   ├── main.py\n│   │   ├── schemas.py\n│   │   ├── routers/\n│   │   └── services/\n│   ├── tests/\n│   ├── Dockerfile\n│   └── requirements.txt\n│\n├── frontend/\n│   ├── src/\n│   │   ├── features/\n│   │   └── types/\n│   ├── Dockerfile\n│   ├── package.json\n│   └── vite.config.ts\n│\n├── docs/\n│   ├── releases/\n│   ├── api-and-feature-reference.md\n│   ├── llm-command-parser-architecture.md\n│   ├── command-planner-design.md\n│   ├── workspace-recovery-flow.md\n│   └── render-deployment-evidence.md\n│\n├── docker-compose.yml\n├── render.yaml\n└── README.md\n```\n\n---\n\n## Local Setup\n\n### Backend\n\nFrom the backend folder:\n\n```bash\ncd backend\n\npython -m venv ../vision-env\nsource ../vision-env/bin/activate\n\npip install -r requirements.txt\nenv -u DATABASE_URL uvicorn app.main:app --reload\n```\n\nThe backend runs at:\n\n```text\nhttp://127.0.0.1:8000\n```\n\n### Frontend\n\nIn a separate terminal:\n\n```bash\ncd frontend\n\nnpm install\nnpm run dev\n```\n\nThe frontend runs at:\n\n```text\nhttp://127.0.0.1:5173\n```\n\n### Optional Docker Compose setup\n\n```bash\ndocker compose up --build\n```\n\nDocker Compose is intended for running the backend, frontend, and PostgreSQL together.\n\n---\n\n## Testing\n\n### Backend tests from the project root\n\n```bash\nenv -u DATABASE_URL PYTHONPATH=backend python -m pytest backend/tests -q\n```\n\n### Backend tests from the backend folder\n\n```bash\ncd backend\n\nenv -u DATABASE_URL python -m pytest -q\n```\n\n### Frontend checks\n\n```bash\ncd frontend\n\nnpm run build\nnpm run lint\n```\n\n### Diff whitespace check\n\n```bash\ngit diff --check\n```\n\nRecent verified local test status:\n\n```text\nBackend tests: 325 passed\nFrontend build: passed\nFrontend lint: passed\n```\n\n---\n\n## CI/CD\n\nGitHub Actions validate pull requests and main branch pushes.\n\nCurrent workflows include:\n\n- Backend tests\n- Backend Docker image build\n- Frontend build\n\nThe project workflow uses:\n\n```text\nfeature branch\npull request\nCI checks\nmerge to main\npost-merge main CI verification\n```\n\nRecent demo-readiness PRs were merged only after pull request checks and post-merge main push checks passed.\n\n---\n\n## Documentation\n\nDetailed documentation is available in the `docs/` folder.\n\nImportant docs include:\n\n```text\ndocs/README.md\ndocs/api-and-feature-reference.md\ndocs/product-walkthrough.md\ndocs/architecture-overview.md\ndocs/walkthrough-assets.md\ndocs/assets/README.md\ndocs/project-vision-and-ai-roadmap.md\ndocs/llm-command-parser-architecture.md\ndocs/command-planner-design.md\ndocs/workspace-recovery-flow.md\ndocs/deployment-readiness-summary.md\ndocs/deployment-hardening-plan.md\ndocs/render-deployment-evidence.md\ndocs/releases/v0.3.0.md\ndocs/releases/v0.4.0.md\ndocs/releases/v0.5.0.md\ndocs/releases/v0.5.1.md\ndocs/releases/v0.5.2.md\n```\n\nThe detailed API and feature inventory from the previous README is preserved in:\n\n```text\ndocs/api-and-feature-reference.md\n```\n\n---\n\n## Recent Demo-Readiness Milestones\n\n### PR #455: Video command history panel\n\nAdded a Video command history panel after the AI Assistant. Completed video workflow actions are now visible in one place, with View result navigation back to matching video outputs.\n\n### PR #456: Video result ordering polish\n\nVideo result panels now follow the order actions are completed. Sampled detection results no longer appear before earlier results such as trim or extracted frames. User Mode no longer shows generic Assistant result cards for video commands.\n\n### PR #457: Final User Mode copy polish\n\nUser Mode now hides long uploaded image filenames, shows a clean image workspace readiness status, and uses **Zoomed image ready** for zoom command outputs. Developer Mode still preserves technical metadata and Assistant/debug wording.\n\n---\n\n## Current Status\n\nVisionCommand AI is currently in a polished demo-ready local state.\n\nStable areas include:\n\n- Image upload, detection, crop, blur, zoom, and generated output reuse\n- Video upload, trim, frame extraction, sampled detection, frame detection, and tracking\n- User Mode and Developer Mode separation\n- AI Assistant command execution for image and video workflows\n- Generated output history and video command history\n- LLM parser/planner tooling and observability panels\n- PostgreSQL-backed persistence where configured\n- Docker and CI-backed development workflow\n\nThe next recommended work is documentation refinement, architecture visuals, sample media references, and deployment hardening rather than additional core UI features.\n\n---\n\n## Roadmap\n\nPossible next improvements:\n\n- Add final walkthrough screenshots using the `docs/assets/` placeholder structure\n- Expand architecture visuals in `docs/architecture-overview.md` with screenshots or rendered diagrams\n- Implement deployment hardening items from `docs/deployment-hardening-plan.md`\n- Add more robust video tracking methods\n- Expand real LLM evaluation coverage\n- Add user-facing screenshots to the README\n- Prepare a portfolio case study version of the project\n\n---\n\n## License\n\nThis project is currently developed as a personal learning and portfolio project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonykarmakar%2Fvision-command-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjonykarmakar%2Fvision-command-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonykarmakar%2Fvision-command-ai/lists"}