An open API service indexing awesome lists of open source software.

https://github.com/neural-bits/production-hub

Hands-on hub to learn techniques to optimize and serve AI models to production the most optimal way.
https://github.com/neural-bits/production-hub

machine-learning optimization production quantization

Last synced: about 1 year ago
JSON representation

Hands-on hub to learn techniques to optimize and serve AI models to production the most optimal way.

Awesome Lists containing this project

README

          

# Neural Bits Production Hub
This repository consists of code and articles on the Neural Bits Newsletter that showcase:
- how to optimize, and quantize models for optimal performance
- efficient model serving in production environments at scale
-
## Categories
### Model Optimization
|ID| 📝  Article | 💻 Code | Details | Complexity | Tech Stack |
|--|---------|-----------------|---------|------------|----------------------|
|001| [Inference Engines Profilling](https://neuralbits.substack.com/p/3-inference-engines-for-optimal-throughput)| [Here](https://github.com/neural-bits/production-hub/tree/main/001-inference_engines) | Profile a CNN model across PyTorch, ONNX, TensorRT, and TorchCompile | 🟩🟩⬜ |Python, Jupyter|

### Model Deployment
|ID| 📝  Article | 💻 Code| Details | Complexity | Tech Stack |
|--|---------|------|---------|------------|----------------------|
|002| [Deploying DL models with NVIDIA Triton Inference Server]()| [Here](https://github.com/neural-bits/production-hub/tree/main/002-triton-server-cnn-deployment) | Full tutorial on how to set-up and deploy ML models with Triton Inference Server | 🟩🟩🟩 |Python, Docker, Bash|

### Quantization Techniques
|ID| Article | Code | Details | Complexity | Tech Stack |
|--|---------|------|---------|------------|----------------------|