https://github.com/iamfaham/model-inference-profiler
A PyTorch-based tool for profiling deep learning model inference performance, analyzing computational bottlenecks, and visualizing resource utilization.
https://github.com/iamfaham/model-inference-profiler
cuda memory pytorch visualizations
Last synced: about 2 months ago
JSON representation
A PyTorch-based tool for profiling deep learning model inference performance, analyzing computational bottlenecks, and visualizing resource utilization.
- Host: GitHub
- URL: https://github.com/iamfaham/model-inference-profiler
- Owner: iamfaham
- Created: 2025-10-20T03:45:50.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-10-20T04:02:09.000Z (8 months ago)
- Last Synced: 2025-10-20T07:33:08.971Z (8 months ago)
- Topics: cuda, memory, pytorch, visualizations
- Language: Jupyter Notebook
- Homepage:
- Size: 177 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Model Inference Profiler
A PyTorch-based tool for profiling deep learning model inference performance, analyzing computational bottlenecks, and visualizing resource utilization.
## Overview
This project provides a comprehensive profiling solution for PyTorch models, enabling you to:
- Analyze inference performance on GPU (CUDA)
- Identify computational bottlenecks
- Visualize layer-wise execution time
- Monitor memory usage patterns
- Profile any pretrained torchvision model
## Features
- **Performance Profiling**: Track CPU and CUDA execution times for each operation
- **Memory Analysis**: Monitor memory allocation and usage across layers
- **Visual Analytics**: Generate bar charts for top time-consuming and memory-intensive operations
- **Model Summary**: Display detailed architecture information with parameter counts
- **Easy Integration**: Works with any PyTorch model from torchvision or custom models
## Requirements
```bash
torch
torchvision
torchinfo
matplotlib
```
## Installation
1. Clone the repository:
```bash
git clone https://github.com/iamfaham/model-inference-profiler.git
cd model-inference-profiler
```
2. Install dependencies:
```bash
pip install torch torchvision torchinfo matplotlib
```
## Usage
### Running in Google Colab
Click the "Open in Colab" badge at the top to run the notebook directly in Google Colab with free GPU access.
### Local Execution
1. Open the Jupyter notebook:
```bash
jupyter notebook model_inference_profiler.ipynb
```
2. Ensure GPU is available:
```python
import torch
print(torch.cuda.is_available()) # Should return True
```
3. The notebook will guide you through:
- Loading a pretrained model (default: ViT-B/16)
- Running warm-up iterations
- Profiling inference
- Visualizing results
### Customizing the Model
To profile a different model, simply change the model loading line:
```python
# Vision Transformer (default)
model = models.vit_b_16(weights=models.ViT_B_16_Weights.DEFAULT)
# Or try other models:
# model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
# model = models.efficientnet_b0(weights=models.EfficientNet_B0_Weights.DEFAULT)
# model = models.mobilenet_v3_small(weights=models.MobileNet_V3_Small_Weights.DEFAULT)
```
## Output Examples
The profiler generates:
1. **Model Summary**: Detailed architecture breakdown with parameter counts and memory estimates
2. **Performance Table**: Top operations sorted by CUDA execution time
3. **CUDA Time Visualization**: Bar chart showing the most time-consuming layers
4. **Memory Usage Visualization**: Bar chart displaying memory allocation per layer
## How It Works
1. **Model Loading**: Loads a pretrained model and moves it to GPU
2. **Warm-up**: Runs multiple inference passes to stabilize GPU performance
3. **Profiling**: Uses PyTorch's built-in profiler to capture:
- CPU and CUDA activities
- Operation shapes
- Memory allocations
4. **Analysis**: Extracts and visualizes performance metrics
## Use Cases
- **Model Optimization**: Identify bottlenecks before deployment
- **Hardware Selection**: Understand resource requirements
- **Comparative Analysis**: Compare different architectures
- **Educational**: Learn about model internals and performance characteristics
## Example Output
For ViT-B/16, you'll see:
- Total parameters: ~86.5M
- Top operations: Matrix multiplications (addmm, sgemm)
- Memory-intensive layers: Attention mechanisms and linear layers
## Contributing
Contributions are welcome! Feel free to:
- Add support for more profiling metrics
- Implement additional visualization options
- Extend to other frameworks
- Improve documentation
## License
This project is open source and available under the MIT License.