An open API service indexing awesome lists of open source software.

https://github.com/jmanhype/vggt-mps

VGGT 3D Vision Agent optimized for Apple Silicon with Metal Performance Shaders
https://github.com/jmanhype/vggt-mps

3d-reconstruction apple-silicon claude-desktop computer-vision depth-estimation m1 m2 m3 macos mcp metal-performance-shaders mps pytorch vggt

Last synced: 20 days ago
JSON representation

VGGT 3D Vision Agent optimized for Apple Silicon with Metal Performance Shaders

Awesome Lists containing this project

README

          

# VGGT-MPS: 3D Vision Agent for Apple Silicon

[![Version](https://img.shields.io/badge/version-2.0.0-blue)](https://github.com/jmanhype/vggt-mps/releases/tag/v2.0.0)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![MPS](https://img.shields.io/badge/Apple%20Silicon-M1%2FM2%2FM3-black)](https://developer.apple.com/metal/)

๐ŸŽ **VGGT (Visual Geometry Grounded Transformer) optimized for Apple Silicon with Metal Performance Shaders (MPS)**

Transform single or multi-view images into rich 3D reconstructions using Facebook Research's VGGT model, now accelerated on M1/M2/M3 Macs.

## ๐ŸŽ‰ Release v2.0.0

**Major Update**: Complete packaging overhaul with unified CLI, PyPI-ready distribution, and production-grade tooling!

## โœจ What's New in v2.0.0

### ๐ŸŽฏ Major Changes
- **Unified CLI**: New `vggt` command with subcommands for all operations
- **Professional Packaging**: PyPI-ready with `pyproject.toml`, proper src layout
- **Web Interface**: Gradio UI for interactive 3D reconstruction (`vggt web`)
- **Enhanced Testing**: Comprehensive test suite with MPS and sparse attention tests
- **Modern Tooling**: UV support, Makefile automation, GitHub Actions CI/CD

### ๐Ÿš€ Core Features
- **MPS Acceleration**: Full GPU acceleration on Apple Silicon using Metal Performance Shaders
- **โšก Sparse Attention**: O(n) memory scaling for city-scale reconstruction (100x savings!)
- **๐ŸŽฅ Multi-View 3D Reconstruction**: Generate depth maps, point clouds, and camera poses from images
- **๐Ÿ”ง MCP Integration**: Model Context Protocol server for Claude Desktop integration
- **๐Ÿ“ฆ 5GB Model**: Efficient 1B parameter model that runs smoothly on Apple Silicon
- **๐Ÿ› ๏ธ Multiple Export Formats**: PLY, OBJ, GLB for 3D point clouds

## ๐ŸŽฏ What VGGT Does

VGGT reconstructs 3D scenes from images by predicting:
- **Depth Maps**: Per-pixel depth estimation
- **Camera Poses**: 6DOF camera parameters
- **3D Point Clouds**: Dense 3D reconstruction
- **Confidence Maps**: Reliability scores for predictions

## ๐Ÿ“‹ Requirements

- Apple Silicon Mac (M1/M2/M3)
- Python 3.10+
- 8GB+ RAM
- 6GB disk space for model

## ๐Ÿš€ Quick Start

### Installation Options

#### Option A: Install from PyPI (Coming Soon)

```bash
# Install from PyPI (when published)
pip install vggt-mps

# Download model weights (5GB)
vggt download
```

#### Option B: Install from Source with UV (Recommended for Development)

```bash
git clone https://github.com/jmanhype/vggt-mps.git
cd vggt-mps

# Install with uv (10-100x faster than pip!)
make install

# Or manually with uv
uv pip install -e .
```

#### Option C: Traditional pip install from Source

```bash
git clone https://github.com/jmanhype/vggt-mps.git
cd vggt-mps

# Create virtual environment
python -m venv vggt-env
source vggt-env/bin/activate

# Install dependencies
pip install -r requirements.txt
```

### 2. Download Model Weights

```bash
# Download the 5GB VGGT model
vggt download

# Or if running from source:
python main.py download
```

Or manually download from [Hugging Face](https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt)

### 3. Test MPS Support

```bash
# Test MPS acceleration
vggt test --suite mps

# Or from source:
python main.py test --suite mps
```

Expected output:
```
โœ… MPS (Metal Performance Shaders) available!
Running on Apple Silicon GPU
โœ… Model weights loaded to mps
โœ… MPS operations working correctly!
```

### 4. Setup Environment (Optional)

```bash
# Copy environment configuration
cp .env.example .env

# Edit .env with your settings
nano .env
```

## ๐Ÿ“– Usage

### CLI Commands (v2.0.0)

All functionality is accessible through the unified `vggt` command:

```bash
# Quick demo with sample images
vggt demo

# Demo with kitchen dataset (4 images)
vggt demo --kitchen --images 4

# Process your own images
vggt reconstruct data/*.jpg

# Use sparse attention for large scenes
vggt reconstruct --sparse data/*.jpg

# Export to specific format
vggt reconstruct --export ply data/*.jpg

# Launch interactive web interface
vggt web

# Open on specific port with public link
vggt web --port 8080 --share

# Run comprehensive tests
vggt test --suite all

# Test sparse attention specifically
vggt test --suite sparse

# Benchmark performance
vggt benchmark --compare

# Download model weights
vggt download
```

### From Source (Development)

If running from source without installation:

```bash
python main.py demo
python main.py reconstruct data/*.jpg
python main.py web
python main.py test --suite mps
python main.py benchmark --compare
```

## ๐Ÿ”ง MCP Server Integration

### Add to Claude Desktop

1. Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
"mcpServers": {
"vggt-agent": {
"command": "uv",
"args": [
"run",
"--python",
"/path/to/vggt-mps/vggt-env/bin/python",
"--with",
"fastmcp",
"fastmcp",
"run",
"/path/to/vggt-mps/src/vggt_mps_mcp.py"
]
}
}
}
```

2. Restart Claude Desktop

### Available MCP Tools

- `vggt_quick_start_inference` - Quick 3D reconstruction from images
- `vggt_extract_video_frames` - Extract frames from video
- `vggt_process_images` - Full VGGT pipeline
- `vggt_create_3d_scene` - Generate GLB 3D files
- `vggt_reconstruct_3d_scene` - Multi-view reconstruction
- `vggt_visualize_reconstruction` - Create visualizations

## ๐Ÿ“ Project Structure

```
vggt-mps/
โ”œโ”€โ”€ main.py # Single entry point
โ”œโ”€โ”€ setup.py # Package installation
โ”œโ”€โ”€ requirements.txt # Dependencies
โ”œโ”€โ”€ .env.example # Environment configuration
โ”‚
โ”œโ”€โ”€ src/ # Source code
โ”‚ โ”œโ”€โ”€ config.py # Centralized configuration
โ”‚ โ”œโ”€โ”€ vggt_core.py # Core VGGT processing
โ”‚ โ”œโ”€โ”€ vggt_sparse_attention.py # Sparse attention (O(n) scaling)
โ”‚ โ”œโ”€โ”€ visualization.py # 3D visualization utilities
โ”‚ โ”‚
โ”‚ โ”œโ”€โ”€ commands/ # CLI commands
โ”‚ โ”‚ โ”œโ”€โ”€ demo.py # Demo command
โ”‚ โ”‚ โ”œโ”€โ”€ reconstruct.py # Reconstruction command
โ”‚ โ”‚ โ”œโ”€โ”€ test_runner.py # Test runner
โ”‚ โ”‚ โ”œโ”€โ”€ benchmark.py # Performance benchmarking
โ”‚ โ”‚ โ””โ”€โ”€ web_interface.py # Gradio web app
โ”‚ โ”‚
โ”‚ โ””โ”€โ”€ utils/ # Utilities
โ”‚ โ”œโ”€โ”€ model_loader.py # Model management
โ”‚ โ”œโ”€โ”€ image_utils.py # Image processing
โ”‚ โ””โ”€โ”€ export.py # Export to PLY/OBJ/GLB
โ”‚
โ”œโ”€โ”€ tests/ # Organized test suite
โ”‚ โ”œโ”€โ”€ test_mps.py # MPS functionality tests
โ”‚ โ”œโ”€โ”€ test_sparse.py # Sparse attention tests
โ”‚ โ””โ”€โ”€ test_integration.py # End-to-end tests
โ”‚
โ”œโ”€โ”€ data/ # Input data directory
โ”œโ”€โ”€ outputs/ # Output directory
โ”œโ”€โ”€ models/ # Model storage
โ”‚
โ”œโ”€โ”€ docs/ # Documentation
โ”‚ โ”œโ”€โ”€ API.md # API documentation
โ”‚ โ”œโ”€โ”€ SPARSE_ATTENTION.md # Technical details
โ”‚ โ””โ”€โ”€ BENCHMARKS.md # Performance results
โ”‚
โ””โ”€โ”€ LICENSE # MIT License
```

## ๐Ÿ–ผ๏ธ Usage Examples

### Process Images

```python
from src.tools.readme import vggt_quick_start_inference

result = vggt_quick_start_inference(
image_directory="./tmp/inputs",
device="mps", # Use Apple Silicon GPU
max_images=4,
save_outputs=True
)
```

### Extract Video Frames

```python
from src.tools.demo_gradio import vggt_extract_video_frames

result = vggt_extract_video_frames(
video_path="input_video.mp4",
frame_interval_seconds=1.0
)
```

### Create 3D Scene

```python
from src.tools.demo_viser import vggt_reconstruct_3d_scene

result = vggt_reconstruct_3d_scene(
images_dir="./tmp/inputs",
device_type="mps",
confidence_threshold=0.5
)
```

## โšก Sparse Attention - NEW!

**City-scale 3D reconstruction is now possible!** We've implemented Gabriele Berton's research idea for O(n) memory scaling.

### ๐ŸŽฏ Key Benefits
- **100x memory savings** for 1000 images
- **No retraining required** - patches existing VGGT at runtime
- **Identical outputs** to regular VGGT (0.000000 difference)
- **MegaLoc covisibility** detection for smart attention masking

### ๐Ÿš€ Usage
```python
from src.vggt_sparse_attention import make_vggt_sparse

# Convert any VGGT to sparse in 1 line
sparse_vggt = make_vggt_sparse(regular_vggt, device="mps")

# Same usage, O(n) memory instead of O(nยฒ)
output = sparse_vggt(images) # Handles 1000+ images!
```

### ๐Ÿ“Š Memory Scaling
| Images | Regular | Sparse | Savings |
|--------|---------|--------|---------|
| 100 | O(10K) | O(1K) | **10x** |
| 500 | O(250K) | O(5K) | **50x** |
| 1000 | O(1M) | O(10K) | **100x** |

**See full results:** [docs/SPARSE_ATTENTION_RESULTS.md](docs/SPARSE_ATTENTION_RESULTS.md)

## ๐Ÿ”ฌ Technical Details

### MPS Optimizations

- **Device Detection**: Auto-detects MPS availability
- **Dtype Selection**: Uses float32 for optimal MPS performance
- **Autocast Handling**: CUDA autocast disabled for MPS
- **Memory Management**: Efficient tensor operations on Metal

### Model Architecture

- **Parameters**: 1B (5GB on disk)
- **Input**: Multi-view images
- **Output**: Depth, camera poses, 3D points
- **Resolution**: 518x518 (VGGT), up to 1024x1024 (input)

## ๐Ÿ› Troubleshooting

### MPS Not Available

```bash
# Check PyTorch MPS support
python -c "import torch; print(torch.backends.mps.is_available())"
```

### Model Loading Issues

```bash
# Verify model file
ls -lh repo/vggt/vggt_model.pt
# Should show ~5GB file
```

### Memory Issues

- Reduce batch size
- Lower resolution
- Use CPU fallback

## ๐Ÿ“š References

- [VGGT Paper](https://arxiv.org/pdf/2507.04009)
- [Facebook Research VGGT](https://github.com/facebookresearch/vggt)
- [Hugging Face Model](https://huggingface.co/facebook/VGGT-1B)

## ๐Ÿ“š Documentation

- **[Development Guide](DEVELOPMENT.md)** - Setting up your dev environment
- **[Publishing Guide](PUBLISHING.md)** - PyPI release process
- **[Contributing Guide](CONTRIBUTING.md)** - How to contribute
- **[API Documentation](docs/)** - Detailed API reference
- **[Examples](examples/)** - Code examples and demos

## ๐Ÿš€ Release Notes

### v2.0.0 (Latest)
- โœจ Unified CLI with `vggt` command
- ๐Ÿ“ฆ Professional Python packaging (PyPI-ready)
- ๐ŸŒ Gradio web interface
- ๐Ÿงช Comprehensive test suite
- ๐Ÿ› ๏ธ Modern tooling (UV, Makefile, GitHub Actions)
- ๐Ÿ“ Complete documentation overhaul

See [full changelog](https://github.com/jmanhype/vggt-mps/releases/tag/v2.0.0)

## ๐Ÿค Contributing

We follow a lightweight Git Flow:

- `main` holds the latest stable release and is protected.
- `develop` is the default integration branch for day-to-day work.

When contributing:

1. Create your feature branch from `develop` (`git switch develop && git switch -c feature/my-change`).
2. Keep commits focused and include tests or documentation updates when relevant.
3. Open your pull request against `develop`; maintainers will promote changes to `main` during releases.

Please open issues for bugs or feature requests before starting large efforts. Full details, testing expectations, and the release process live in [`CONTRIBUTING.md`](CONTRIBUTING.md).

## ๐Ÿ“„ License

MIT License - See LICENSE file for details

## ๐Ÿ™ Acknowledgments

- Facebook Research for VGGT
- Apple for Metal Performance Shaders
- PyTorch team for MPS backend

---

**Made with ๐ŸŽ for Apple Silicon by the AI community**