{"id":42068630,"url":"https://github.com/zenml-io/gpudoctor","last_synced_at":"2026-01-26T08:36:05.999Z","repository":{"id":329895033,"uuid":"1111597614","full_name":"zenml-io/gpudoctor","owner":"zenml-io","description":"A curated catalog of ML Docker base images to help engineers escape \"CUDA hell\" and choose the right container for their GPU workloads.","archived":false,"fork":false,"pushed_at":"2026-01-12T06:12:47.000Z","size":22092,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-12T16:13:14.460Z","etag":null,"topics":["docker","docker-image","docker-ml","gpu","machine"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zenml-io.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-12-07T09:12:30.000Z","updated_at":"2025-12-09T10:09:39.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zenml-io/gpudoctor","commit_stats":null,"previous_names":["zenml-io/gpudoctor"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zenml-io/gpudoctor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenml-io%2Fgpudoctor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenml-io%2Fgpudoctor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenml-io%2Fgpudoctor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenml-io%2Fgpudoctor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zenml-io","download_url":"https://codeload.github.com/zenml-io/gpudoctor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenml-io%2Fgpudoctor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28770946,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-26T07:45:00.504Z","status":"ssl_error","status_checked_at":"2026-01-26T07:45:00.070Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-image","docker-ml","gpu","machine"],"created_at":"2026-01-26T08:36:05.310Z","updated_at":"2026-01-26T08:36:05.986Z","avatar_url":"https://github.com/zenml-io.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GPU Doctor\n\n![](assets/gpu-doctor-screenshot.png)\n\nA curated catalog of ML Docker base images to help engineers escape \"CUDA hell\" and choose the right container for their GPU workloads.\n\n## The Problem\n\nSelecting a Docker base image for ML is painful:\n- CUDA/cuDNN version matrices are confusing\n- Image sizes range from 1GB to 20GB+ with unclear trade-offs\n- Documentation is scattered across NVIDIA, cloud providers, and framework maintainers\n- Security scan data is hard to find\n\n## The Solution\n\nGPU Doctor provides:\n- **Structured catalog** of vetted ML images (PyTorch, TensorFlow, vLLM, TGI, Triton, etc.)\n- **Consistent metadata** including CUDA versions, driver requirements, sizes, and security ratings\n- **Guided picker** to find the right image for your use case\n- **Searchable table** with filtering by framework, role, and cloud affinity\n\n## Quick Start\n\n```bash\n# Browse the catalog\ncat data/images.json | jq '.images[] | {id, name, role: .capabilities.role}'\n\n# Validate after edits\n.venv/bin/python -c \"\nimport json\nfrom jsonschema import validate\nwith open('data/schema.json') as f: schema = json.load(f)\nwith open('data/images.json') as f: data = json.load(f)\nvalidate(instance=data, schema=schema)\nprint('Valid')\n\"\n```\n\n## Project Status\n\n- [x] JSON Schema design\n- [x] Comprehensive catalog (200+ images covering PyTorch, TensorFlow, vLLM, TGI, Ollama, CUDA bases, NGC images for JAX, NeMo, Triton, and more)\n- [x] Website with guided picker (5-step wizard) and table view\n- [x] CI automation for catalog updates from registries\n- [ ] Security scan enrichment (Trivy integration)\n- [ ] Claude skill for recommendations\n\n## Catalog Automation\n\nThe catalog is automatically updated from container registries:\n\n```bash\n# Install dependencies\npip install -r scripts/requirements.txt\n\n# Dry-run (preview changes without writing)\npython scripts/update_catalog.py --dry-run --source all\n\n# Update from all registries (Docker Hub, GHCR, NGC)\npython scripts/update_catalog.py --source all\n\n# Update from a specific registry\npython scripts/update_catalog.py --source dockerhub\n```\n\n**How it works:**\n- `data/tracked_images.yaml` defines which images to track\n- The script fetches metadata from registries, parses tags, and builds catalog entries\n- Curated fields (`notes`, `recommended_for`, etc.) are preserved during updates\n- A GitHub Actions workflow runs weekly and creates PRs with updates\n\n### Audit / Security scans\n\nTo run lightweight security scans against the catalog:\n\n```bash\npython scripts/audit_catalog.py --dry-run --mode security --max-images 5\n```\n\nBefore scanning images from `public.ecr.aws`, authenticate once per shell session:\n\n```bash\naws ecr-public get-login-password --region us-east-1 \\\n  | docker login --username AWS --password-stdin public.ecr.aws\n```\n\nAlternatively, set `TRIVY_REGISTRY_TOKEN` to that password so Trivy can reuse it without an explicit `docker login`. Scan artifacts and cache live under `data/.audit`, which keeps reruns fast.\n\n## Website\n\nThe website is a Next.js app in `web/`:\n\n```bash\ncd web\nyarn install\nyarn dev\n```\n\n**Pages:**\n- `/guide` - 5-step wizard to find the right image based on workload, environment, frameworks, and priorities\n- `/table` - Searchable/filterable table of all images\n- `/images/[id]` - Detailed view of a single image with specs, security info, and quick-start commands\n\n[![maintained-by-zenml](assets/maintained-by-zenml.png)](https://github.com/zenml-io/zenml)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenml-io%2Fgpudoctor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzenml-io%2Fgpudoctor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenml-io%2Fgpudoctor/lists"}