https://github.com/neural-bits/production-hub
Hands-on hub to learn techniques to optimize and serve AI models to production the most optimal way.
https://github.com/neural-bits/production-hub
machine-learning optimization production quantization
Last synced: about 1 year ago
JSON representation
Hands-on hub to learn techniques to optimize and serve AI models to production the most optimal way.
- Host: GitHub
- URL: https://github.com/neural-bits/production-hub
- Owner: neural-bits
- Created: 2024-08-05T15:54:53.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-11T13:10:51.000Z (almost 2 years ago)
- Last Synced: 2025-04-23T13:22:59.866Z (about 1 year ago)
- Topics: machine-learning, optimization, production, quantization
- Language: Jupyter Notebook
- Homepage:
- Size: 43.2 MB
- Stars: 6
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Neural Bits Production Hub
This repository consists of code and articles on the Neural Bits Newsletter that showcase:
- how to optimize, and quantize models for optimal performance
- efficient model serving in production environments at scale
-
## Categories
### Model Optimization
|ID| 📝 Article | 💻 Code | Details | Complexity | Tech Stack |
|--|---------|-----------------|---------|------------|----------------------|
|001| [Inference Engines Profilling](https://neuralbits.substack.com/p/3-inference-engines-for-optimal-throughput)| [Here](https://github.com/neural-bits/production-hub/tree/main/001-inference_engines) | Profile a CNN model across PyTorch, ONNX, TensorRT, and TorchCompile | 🟩🟩⬜ |Python, Jupyter|
### Model Deployment
|ID| 📝 Article | 💻 Code| Details | Complexity | Tech Stack |
|--|---------|------|---------|------------|----------------------|
|002| [Deploying DL models with NVIDIA Triton Inference Server]()| [Here](https://github.com/neural-bits/production-hub/tree/main/002-triton-server-cnn-deployment) | Full tutorial on how to set-up and deploy ML models with Triton Inference Server | 🟩🟩🟩 |Python, Docker, Bash|
### Quantization Techniques
|ID| Article | Code | Details | Complexity | Tech Stack |
|--|---------|------|---------|------------|----------------------|