https://github.com/jmanhype/vggt-mps
VGGT 3D Vision Agent optimized for Apple Silicon with Metal Performance Shaders
https://github.com/jmanhype/vggt-mps
3d-reconstruction apple-silicon claude-desktop computer-vision depth-estimation m1 m2 m3 macos mcp metal-performance-shaders mps pytorch vggt
Last synced: 20 days ago
JSON representation
VGGT 3D Vision Agent optimized for Apple Silicon with Metal Performance Shaders
- Host: GitHub
- URL: https://github.com/jmanhype/vggt-mps
- Owner: jmanhype
- License: mit
- Created: 2025-09-18T15:23:05.000Z (29 days ago)
- Default Branch: main
- Last Pushed: 2025-09-18T17:15:40.000Z (29 days ago)
- Last Synced: 2025-09-18T18:27:24.838Z (29 days ago)
- Topics: 3d-reconstruction, apple-silicon, claude-desktop, computer-vision, depth-estimation, m1, m2, m3, macos, mcp, metal-performance-shaders, mps, pytorch, vggt
- Language: Python
- Homepage: https://github.com/facebookresearch/vggt
- Size: 34.1 MB
- Stars: 2
- Watchers: 0
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VGGT-MPS: 3D Vision Agent for Apple Silicon
[](https://github.com/jmanhype/vggt-mps/releases/tag/v2.0.0)
[](https://www.python.org/downloads/)
[](LICENSE)
[](https://developer.apple.com/metal/)๐ **VGGT (Visual Geometry Grounded Transformer) optimized for Apple Silicon with Metal Performance Shaders (MPS)**
Transform single or multi-view images into rich 3D reconstructions using Facebook Research's VGGT model, now accelerated on M1/M2/M3 Macs.
## ๐ Release v2.0.0
**Major Update**: Complete packaging overhaul with unified CLI, PyPI-ready distribution, and production-grade tooling!
## โจ What's New in v2.0.0
### ๐ฏ Major Changes
- **Unified CLI**: New `vggt` command with subcommands for all operations
- **Professional Packaging**: PyPI-ready with `pyproject.toml`, proper src layout
- **Web Interface**: Gradio UI for interactive 3D reconstruction (`vggt web`)
- **Enhanced Testing**: Comprehensive test suite with MPS and sparse attention tests
- **Modern Tooling**: UV support, Makefile automation, GitHub Actions CI/CD### ๐ Core Features
- **MPS Acceleration**: Full GPU acceleration on Apple Silicon using Metal Performance Shaders
- **โก Sparse Attention**: O(n) memory scaling for city-scale reconstruction (100x savings!)
- **๐ฅ Multi-View 3D Reconstruction**: Generate depth maps, point clouds, and camera poses from images
- **๐ง MCP Integration**: Model Context Protocol server for Claude Desktop integration
- **๐ฆ 5GB Model**: Efficient 1B parameter model that runs smoothly on Apple Silicon
- **๐ ๏ธ Multiple Export Formats**: PLY, OBJ, GLB for 3D point clouds## ๐ฏ What VGGT Does
VGGT reconstructs 3D scenes from images by predicting:
- **Depth Maps**: Per-pixel depth estimation
- **Camera Poses**: 6DOF camera parameters
- **3D Point Clouds**: Dense 3D reconstruction
- **Confidence Maps**: Reliability scores for predictions## ๐ Requirements
- Apple Silicon Mac (M1/M2/M3)
- Python 3.10+
- 8GB+ RAM
- 6GB disk space for model## ๐ Quick Start
### Installation Options
#### Option A: Install from PyPI (Coming Soon)
```bash
# Install from PyPI (when published)
pip install vggt-mps# Download model weights (5GB)
vggt download
```#### Option B: Install from Source with UV (Recommended for Development)
```bash
git clone https://github.com/jmanhype/vggt-mps.git
cd vggt-mps# Install with uv (10-100x faster than pip!)
make install# Or manually with uv
uv pip install -e .
```#### Option C: Traditional pip install from Source
```bash
git clone https://github.com/jmanhype/vggt-mps.git
cd vggt-mps# Create virtual environment
python -m venv vggt-env
source vggt-env/bin/activate# Install dependencies
pip install -r requirements.txt
```### 2. Download Model Weights
```bash
# Download the 5GB VGGT model
vggt download# Or if running from source:
python main.py download
```Or manually download from [Hugging Face](https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt)
### 3. Test MPS Support
```bash
# Test MPS acceleration
vggt test --suite mps# Or from source:
python main.py test --suite mps
```Expected output:
```
โ MPS (Metal Performance Shaders) available!
Running on Apple Silicon GPU
โ Model weights loaded to mps
โ MPS operations working correctly!
```### 4. Setup Environment (Optional)
```bash
# Copy environment configuration
cp .env.example .env# Edit .env with your settings
nano .env
```## ๐ Usage
### CLI Commands (v2.0.0)
All functionality is accessible through the unified `vggt` command:
```bash
# Quick demo with sample images
vggt demo# Demo with kitchen dataset (4 images)
vggt demo --kitchen --images 4# Process your own images
vggt reconstruct data/*.jpg# Use sparse attention for large scenes
vggt reconstruct --sparse data/*.jpg# Export to specific format
vggt reconstruct --export ply data/*.jpg# Launch interactive web interface
vggt web# Open on specific port with public link
vggt web --port 8080 --share# Run comprehensive tests
vggt test --suite all# Test sparse attention specifically
vggt test --suite sparse# Benchmark performance
vggt benchmark --compare# Download model weights
vggt download
```### From Source (Development)
If running from source without installation:
```bash
python main.py demo
python main.py reconstruct data/*.jpg
python main.py web
python main.py test --suite mps
python main.py benchmark --compare
```## ๐ง MCP Server Integration
### Add to Claude Desktop
1. Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"vggt-agent": {
"command": "uv",
"args": [
"run",
"--python",
"/path/to/vggt-mps/vggt-env/bin/python",
"--with",
"fastmcp",
"fastmcp",
"run",
"/path/to/vggt-mps/src/vggt_mps_mcp.py"
]
}
}
}
```2. Restart Claude Desktop
### Available MCP Tools
- `vggt_quick_start_inference` - Quick 3D reconstruction from images
- `vggt_extract_video_frames` - Extract frames from video
- `vggt_process_images` - Full VGGT pipeline
- `vggt_create_3d_scene` - Generate GLB 3D files
- `vggt_reconstruct_3d_scene` - Multi-view reconstruction
- `vggt_visualize_reconstruction` - Create visualizations## ๐ Project Structure
```
vggt-mps/
โโโ main.py # Single entry point
โโโ setup.py # Package installation
โโโ requirements.txt # Dependencies
โโโ .env.example # Environment configuration
โ
โโโ src/ # Source code
โ โโโ config.py # Centralized configuration
โ โโโ vggt_core.py # Core VGGT processing
โ โโโ vggt_sparse_attention.py # Sparse attention (O(n) scaling)
โ โโโ visualization.py # 3D visualization utilities
โ โ
โ โโโ commands/ # CLI commands
โ โ โโโ demo.py # Demo command
โ โ โโโ reconstruct.py # Reconstruction command
โ โ โโโ test_runner.py # Test runner
โ โ โโโ benchmark.py # Performance benchmarking
โ โ โโโ web_interface.py # Gradio web app
โ โ
โ โโโ utils/ # Utilities
โ โโโ model_loader.py # Model management
โ โโโ image_utils.py # Image processing
โ โโโ export.py # Export to PLY/OBJ/GLB
โ
โโโ tests/ # Organized test suite
โ โโโ test_mps.py # MPS functionality tests
โ โโโ test_sparse.py # Sparse attention tests
โ โโโ test_integration.py # End-to-end tests
โ
โโโ data/ # Input data directory
โโโ outputs/ # Output directory
โโโ models/ # Model storage
โ
โโโ docs/ # Documentation
โ โโโ API.md # API documentation
โ โโโ SPARSE_ATTENTION.md # Technical details
โ โโโ BENCHMARKS.md # Performance results
โ
โโโ LICENSE # MIT License
```## ๐ผ๏ธ Usage Examples
### Process Images
```python
from src.tools.readme import vggt_quick_start_inferenceresult = vggt_quick_start_inference(
image_directory="./tmp/inputs",
device="mps", # Use Apple Silicon GPU
max_images=4,
save_outputs=True
)
```### Extract Video Frames
```python
from src.tools.demo_gradio import vggt_extract_video_framesresult = vggt_extract_video_frames(
video_path="input_video.mp4",
frame_interval_seconds=1.0
)
```### Create 3D Scene
```python
from src.tools.demo_viser import vggt_reconstruct_3d_sceneresult = vggt_reconstruct_3d_scene(
images_dir="./tmp/inputs",
device_type="mps",
confidence_threshold=0.5
)
```## โก Sparse Attention - NEW!
**City-scale 3D reconstruction is now possible!** We've implemented Gabriele Berton's research idea for O(n) memory scaling.
### ๐ฏ Key Benefits
- **100x memory savings** for 1000 images
- **No retraining required** - patches existing VGGT at runtime
- **Identical outputs** to regular VGGT (0.000000 difference)
- **MegaLoc covisibility** detection for smart attention masking### ๐ Usage
```python
from src.vggt_sparse_attention import make_vggt_sparse# Convert any VGGT to sparse in 1 line
sparse_vggt = make_vggt_sparse(regular_vggt, device="mps")# Same usage, O(n) memory instead of O(nยฒ)
output = sparse_vggt(images) # Handles 1000+ images!
```### ๐ Memory Scaling
| Images | Regular | Sparse | Savings |
|--------|---------|--------|---------|
| 100 | O(10K) | O(1K) | **10x** |
| 500 | O(250K) | O(5K) | **50x** |
| 1000 | O(1M) | O(10K) | **100x** |**See full results:** [docs/SPARSE_ATTENTION_RESULTS.md](docs/SPARSE_ATTENTION_RESULTS.md)
## ๐ฌ Technical Details
### MPS Optimizations
- **Device Detection**: Auto-detects MPS availability
- **Dtype Selection**: Uses float32 for optimal MPS performance
- **Autocast Handling**: CUDA autocast disabled for MPS
- **Memory Management**: Efficient tensor operations on Metal### Model Architecture
- **Parameters**: 1B (5GB on disk)
- **Input**: Multi-view images
- **Output**: Depth, camera poses, 3D points
- **Resolution**: 518x518 (VGGT), up to 1024x1024 (input)## ๐ Troubleshooting
### MPS Not Available
```bash
# Check PyTorch MPS support
python -c "import torch; print(torch.backends.mps.is_available())"
```### Model Loading Issues
```bash
# Verify model file
ls -lh repo/vggt/vggt_model.pt
# Should show ~5GB file
```### Memory Issues
- Reduce batch size
- Lower resolution
- Use CPU fallback## ๐ References
- [VGGT Paper](https://arxiv.org/pdf/2507.04009)
- [Facebook Research VGGT](https://github.com/facebookresearch/vggt)
- [Hugging Face Model](https://huggingface.co/facebook/VGGT-1B)## ๐ Documentation
- **[Development Guide](DEVELOPMENT.md)** - Setting up your dev environment
- **[Publishing Guide](PUBLISHING.md)** - PyPI release process
- **[Contributing Guide](CONTRIBUTING.md)** - How to contribute
- **[API Documentation](docs/)** - Detailed API reference
- **[Examples](examples/)** - Code examples and demos## ๐ Release Notes
### v2.0.0 (Latest)
- โจ Unified CLI with `vggt` command
- ๐ฆ Professional Python packaging (PyPI-ready)
- ๐ Gradio web interface
- ๐งช Comprehensive test suite
- ๐ ๏ธ Modern tooling (UV, Makefile, GitHub Actions)
- ๐ Complete documentation overhaulSee [full changelog](https://github.com/jmanhype/vggt-mps/releases/tag/v2.0.0)
## ๐ค Contributing
We follow a lightweight Git Flow:
- `main` holds the latest stable release and is protected.
- `develop` is the default integration branch for day-to-day work.When contributing:
1. Create your feature branch from `develop` (`git switch develop && git switch -c feature/my-change`).
2. Keep commits focused and include tests or documentation updates when relevant.
3. Open your pull request against `develop`; maintainers will promote changes to `main` during releases.Please open issues for bugs or feature requests before starting large efforts. Full details, testing expectations, and the release process live in [`CONTRIBUTING.md`](CONTRIBUTING.md).
## ๐ License
MIT License - See LICENSE file for details
## ๐ Acknowledgments
- Facebook Research for VGGT
- Apple for Metal Performance Shaders
- PyTorch team for MPS backend---
**Made with ๐ for Apple Silicon by the AI community**