{"id":31375970,"url":"https://github.com/jmanhype/vggt-mps","last_synced_at":"2025-09-28T02:52:51.647Z","repository":{"id":315446056,"uuid":"1059538898","full_name":"jmanhype/vggt-mps","owner":"jmanhype","description":"VGGT 3D Vision Agent optimized for Apple Silicon with Metal Performance Shaders","archived":false,"fork":false,"pushed_at":"2025-09-18T17:15:40.000Z","size":35801,"stargazers_count":2,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-18T18:27:24.838Z","etag":null,"topics":["3d-reconstruction","apple-silicon","claude-desktop","computer-vision","depth-estimation","m1","m2","m3","macos","mcp","metal-performance-shaders","mps","pytorch","vggt"],"latest_commit_sha":null,"homepage":"https://github.com/facebookresearch/vggt","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jmanhype.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-18T15:23:05.000Z","updated_at":"2025-09-18T17:15:44.000Z","dependencies_parsed_at":"2025-09-18T18:27:29.105Z","dependency_job_id":"94a844c6-657a-4ea7-9c9c-0c59b492c5f1","html_url":"https://github.com/jmanhype/vggt-mps","commit_stats":null,"previous_names":["jmanhype/vggt-mps"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/jmanhype/vggt-mps","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmanhype%2Fvggt-mps","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmanhype%2Fvggt-mps/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmanhype%2Fvggt-mps/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmanhype%2Fvggt-mps/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jmanhype","download_url":"https://codeload.github.com/jmanhype/vggt-mps/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmanhype%2Fvggt-mps/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277318707,"owners_count":25798184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-28T02:00:08.834Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-reconstruction","apple-silicon","claude-desktop","computer-vision","depth-estimation","m1","m2","m3","macos","mcp","metal-performance-shaders","mps","pytorch","vggt"],"created_at":"2025-09-28T02:52:50.379Z","updated_at":"2025-09-28T02:52:51.638Z","avatar_url":"https://github.com/jmanhype.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VGGT-MPS: 3D Vision Agent for Apple Silicon\n\n[![Version](https://img.shields.io/badge/version-2.0.0-blue)](https://github.com/jmanhype/vggt-mps/releases/tag/v2.0.0)\n[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n[![MPS](https://img.shields.io/badge/Apple%20Silicon-M1%2FM2%2FM3-black)](https://developer.apple.com/metal/)\n\n🍎 **VGGT (Visual Geometry Grounded Transformer) optimized for Apple Silicon with Metal Performance Shaders (MPS)**\n\nTransform single or multi-view images into rich 3D reconstructions using Facebook Research's VGGT model, now accelerated on M1/M2/M3 Macs.\n\n## 🎉 Release v2.0.0\n\n**Major Update**: Complete packaging overhaul with unified CLI, PyPI-ready distribution, and production-grade tooling!\n\n## ✨ What's New in v2.0.0\n\n### 🎯 Major Changes\n- **Unified CLI**: New `vggt` command with subcommands for all operations\n- **Professional Packaging**: PyPI-ready with `pyproject.toml`, proper src layout\n- **Web Interface**: Gradio UI for interactive 3D reconstruction (`vggt web`)\n- **Enhanced Testing**: Comprehensive test suite with MPS and sparse attention tests\n- **Modern Tooling**: UV support, Makefile automation, GitHub Actions CI/CD\n\n### 🚀 Core Features\n- **MPS Acceleration**: Full GPU acceleration on Apple Silicon using Metal Performance Shaders\n- **⚡ Sparse Attention**: O(n) memory scaling for city-scale reconstruction (100x savings!)\n- **🎥 Multi-View 3D Reconstruction**: Generate depth maps, point clouds, and camera poses from images\n- **🔧 MCP Integration**: Model Context Protocol server for Claude Desktop integration\n- **📦 5GB Model**: Efficient 1B parameter model that runs smoothly on Apple Silicon\n- **🛠️ Multiple Export Formats**: PLY, OBJ, GLB for 3D point clouds\n\n## 🎯 What VGGT Does\n\nVGGT reconstructs 3D scenes from images by predicting:\n- **Depth Maps**: Per-pixel depth estimation\n- **Camera Poses**: 6DOF camera parameters\n- **3D Point Clouds**: Dense 3D reconstruction\n- **Confidence Maps**: Reliability scores for predictions\n\n## 📋 Requirements\n\n- Apple Silicon Mac (M1/M2/M3)\n- Python 3.10+\n- 8GB+ RAM\n- 6GB disk space for model\n\n## 🚀 Quick Start\n\n### Installation Options\n\n#### Option A: Install from PyPI (Coming Soon)\n\n```bash\n# Install from PyPI (when published)\npip install vggt-mps\n\n# Download model weights (5GB)\nvggt download\n```\n\n#### Option B: Install from Source with UV (Recommended for Development)\n\n```bash\ngit clone https://github.com/jmanhype/vggt-mps.git\ncd vggt-mps\n\n# Install with uv (10-100x faster than pip!)\nmake install\n\n# Or manually with uv\nuv pip install -e .\n```\n\n#### Option C: Traditional pip install from Source\n\n```bash\ngit clone https://github.com/jmanhype/vggt-mps.git\ncd vggt-mps\n\n# Create virtual environment\npython -m venv vggt-env\nsource vggt-env/bin/activate\n\n# Install dependencies\npip install -r requirements.txt\n```\n\n### 2. Download Model Weights\n\n```bash\n# Download the 5GB VGGT model\nvggt download\n\n# Or if running from source:\npython main.py download\n```\n\nOr manually download from [Hugging Face](https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt)\n\n### 3. Test MPS Support\n\n```bash\n# Test MPS acceleration\nvggt test --suite mps\n\n# Or from source:\npython main.py test --suite mps\n```\n\nExpected output:\n```\n✅ MPS (Metal Performance Shaders) available!\n   Running on Apple Silicon GPU\n✅ Model weights loaded to mps\n✅ MPS operations working correctly!\n```\n\n### 4. Setup Environment (Optional)\n\n```bash\n# Copy environment configuration\ncp .env.example .env\n\n# Edit .env with your settings\nnano .env\n```\n\n## 📖 Usage\n\n### CLI Commands (v2.0.0)\n\nAll functionality is accessible through the unified `vggt` command:\n\n```bash\n# Quick demo with sample images\nvggt demo\n\n# Demo with kitchen dataset (4 images)\nvggt demo --kitchen --images 4\n\n# Process your own images\nvggt reconstruct data/*.jpg\n\n# Use sparse attention for large scenes\nvggt reconstruct --sparse data/*.jpg\n\n# Export to specific format\nvggt reconstruct --export ply data/*.jpg\n\n# Launch interactive web interface\nvggt web\n\n# Open on specific port with public link\nvggt web --port 8080 --share\n\n# Run comprehensive tests\nvggt test --suite all\n\n# Test sparse attention specifically\nvggt test --suite sparse\n\n# Benchmark performance\nvggt benchmark --compare\n\n# Download model weights\nvggt download\n```\n\n### From Source (Development)\n\nIf running from source without installation:\n\n```bash\npython main.py demo\npython main.py reconstruct data/*.jpg\npython main.py web\npython main.py test --suite mps\npython main.py benchmark --compare\n```\n\n## 🔧 MCP Server Integration\n\n### Add to Claude Desktop\n\n1. Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"vggt-agent\": {\n      \"command\": \"uv\",\n      \"args\": [\n        \"run\",\n        \"--python\",\n        \"/path/to/vggt-mps/vggt-env/bin/python\",\n        \"--with\",\n        \"fastmcp\",\n        \"fastmcp\",\n        \"run\",\n        \"/path/to/vggt-mps/src/vggt_mps_mcp.py\"\n      ]\n    }\n  }\n}\n```\n\n2. Restart Claude Desktop\n\n### Available MCP Tools\n\n- `vggt_quick_start_inference` - Quick 3D reconstruction from images\n- `vggt_extract_video_frames` - Extract frames from video\n- `vggt_process_images` - Full VGGT pipeline\n- `vggt_create_3d_scene` - Generate GLB 3D files\n- `vggt_reconstruct_3d_scene` - Multi-view reconstruction\n- `vggt_visualize_reconstruction` - Create visualizations\n\n## 📁 Project Structure\n\n```\nvggt-mps/\n├── main.py                      # Single entry point\n├── setup.py                     # Package installation\n├── requirements.txt             # Dependencies\n├── .env.example                 # Environment configuration\n│\n├── src/                         # Source code\n│   ├── config.py               # Centralized configuration\n│   ├── vggt_core.py            # Core VGGT processing\n│   ├── vggt_sparse_attention.py # Sparse attention (O(n) scaling)\n│   ├── visualization.py        # 3D visualization utilities\n│   │\n│   ├── commands/               # CLI commands\n│   │   ├── demo.py            # Demo command\n│   │   ├── reconstruct.py     # Reconstruction command\n│   │   ├── test_runner.py     # Test runner\n│   │   ├── benchmark.py       # Performance benchmarking\n│   │   └── web_interface.py   # Gradio web app\n│   │\n│   └── utils/                  # Utilities\n│       ├── model_loader.py    # Model management\n│       ├── image_utils.py     # Image processing\n│       └── export.py          # Export to PLY/OBJ/GLB\n│\n├── tests/                       # Organized test suite\n│   ├── test_mps.py            # MPS functionality tests\n│   ├── test_sparse.py         # Sparse attention tests\n│   └── test_integration.py    # End-to-end tests\n│\n├── data/                        # Input data directory\n├── outputs/                     # Output directory\n├── models/                      # Model storage\n│\n├── docs/                        # Documentation\n│   ├── API.md                  # API documentation\n│   ├── SPARSE_ATTENTION.md    # Technical details\n│   └── BENCHMARKS.md          # Performance results\n│\n└── LICENSE                      # MIT License\n```\n\n## 🖼️ Usage Examples\n\n### Process Images\n\n```python\nfrom src.tools.readme import vggt_quick_start_inference\n\nresult = vggt_quick_start_inference(\n    image_directory=\"./tmp/inputs\",\n    device=\"mps\",  # Use Apple Silicon GPU\n    max_images=4,\n    save_outputs=True\n)\n```\n\n### Extract Video Frames\n\n```python\nfrom src.tools.demo_gradio import vggt_extract_video_frames\n\nresult = vggt_extract_video_frames(\n    video_path=\"input_video.mp4\",\n    frame_interval_seconds=1.0\n)\n```\n\n### Create 3D Scene\n\n```python\nfrom src.tools.demo_viser import vggt_reconstruct_3d_scene\n\nresult = vggt_reconstruct_3d_scene(\n    images_dir=\"./tmp/inputs\",\n    device_type=\"mps\",\n    confidence_threshold=0.5\n)\n```\n\n## ⚡ Sparse Attention - NEW!\n\n**City-scale 3D reconstruction is now possible!** We've implemented Gabriele Berton's research idea for O(n) memory scaling.\n\n### 🎯 Key Benefits\n- **100x memory savings** for 1000 images\n- **No retraining required** - patches existing VGGT at runtime\n- **Identical outputs** to regular VGGT (0.000000 difference)\n- **MegaLoc covisibility** detection for smart attention masking\n\n### 🚀 Usage\n```python\nfrom src.vggt_sparse_attention import make_vggt_sparse\n\n# Convert any VGGT to sparse in 1 line\nsparse_vggt = make_vggt_sparse(regular_vggt, device=\"mps\")\n\n# Same usage, O(n) memory instead of O(n²)\noutput = sparse_vggt(images)  # Handles 1000+ images!\n```\n\n### 📊 Memory Scaling\n| Images | Regular | Sparse | Savings |\n|--------|---------|--------|---------|\n| 100    | O(10K)  | O(1K)  | **10x** |\n| 500    | O(250K) | O(5K)  | **50x** |\n| 1000   | O(1M)   | O(10K) | **100x** |\n\n**See full results:** [docs/SPARSE_ATTENTION_RESULTS.md](docs/SPARSE_ATTENTION_RESULTS.md)\n\n## 🔬 Technical Details\n\n### MPS Optimizations\n\n- **Device Detection**: Auto-detects MPS availability\n- **Dtype Selection**: Uses float32 for optimal MPS performance\n- **Autocast Handling**: CUDA autocast disabled for MPS\n- **Memory Management**: Efficient tensor operations on Metal\n\n### Model Architecture\n\n- **Parameters**: 1B (5GB on disk)\n- **Input**: Multi-view images\n- **Output**: Depth, camera poses, 3D points\n- **Resolution**: 518x518 (VGGT), up to 1024x1024 (input)\n\n## 🐛 Troubleshooting\n\n### MPS Not Available\n\n```bash\n# Check PyTorch MPS support\npython -c \"import torch; print(torch.backends.mps.is_available())\"\n```\n\n### Model Loading Issues\n\n```bash\n# Verify model file\nls -lh repo/vggt/vggt_model.pt\n# Should show ~5GB file\n```\n\n### Memory Issues\n\n- Reduce batch size\n- Lower resolution\n- Use CPU fallback\n\n## 📚 References\n\n- [VGGT Paper](https://arxiv.org/pdf/2507.04009)\n- [Facebook Research VGGT](https://github.com/facebookresearch/vggt)\n- [Hugging Face Model](https://huggingface.co/facebook/VGGT-1B)\n\n## 📚 Documentation\n\n- **[Development Guide](DEVELOPMENT.md)** - Setting up your dev environment\n- **[Publishing Guide](PUBLISHING.md)** - PyPI release process\n- **[Contributing Guide](CONTRIBUTING.md)** - How to contribute\n- **[API Documentation](docs/)** - Detailed API reference\n- **[Examples](examples/)** - Code examples and demos\n\n## 🚀 Release Notes\n\n### v2.0.0 (Latest)\n- ✨ Unified CLI with `vggt` command\n- 📦 Professional Python packaging (PyPI-ready)\n- 🌐 Gradio web interface\n- 🧪 Comprehensive test suite\n- 🛠️ Modern tooling (UV, Makefile, GitHub Actions)\n- 📝 Complete documentation overhaul\n\nSee [full changelog](https://github.com/jmanhype/vggt-mps/releases/tag/v2.0.0)\n\n## 🤝 Contributing\n\nWe follow a lightweight Git Flow:\n\n- `main` holds the latest stable release and is protected.\n- `develop` is the default integration branch for day-to-day work.\n\nWhen contributing:\n\n1. Create your feature branch from `develop` (`git switch develop \u0026\u0026 git switch -c feature/my-change`).\n2. Keep commits focused and include tests or documentation updates when relevant.\n3. Open your pull request against `develop`; maintainers will promote changes to `main` during releases.\n\nPlease open issues for bugs or feature requests before starting large efforts. Full details, testing expectations, and the release process live in [`CONTRIBUTING.md`](CONTRIBUTING.md).\n\n## 📄 License\n\nMIT License - See LICENSE file for details\n\n## 🙏 Acknowledgments\n\n- Facebook Research for VGGT\n- Apple for Metal Performance Shaders\n- PyTorch team for MPS backend\n\n---\n\n**Made with 🍎 for Apple Silicon by the AI community**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmanhype%2Fvggt-mps","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjmanhype%2Fvggt-mps","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmanhype%2Fvggt-mps/lists"}